sol / hpack

hpack: A modern format for Haskell packages
MIT License
623 stars 103 forks source link

Optionally don't expand globs in `extra-source-files` #594

Closed parsonsmatt closed 3 hours ago

parsonsmatt commented 5 days ago

We have ~1800 lines in our extra-source-files in our .cabal file, which come from a few globs in our package.yaml. We are using a new cabal now which can handle globs directly. However, the logic in cabal effectively treats every line in extra-source-files as a glob, even if it doesn't have any wildcard characters. This results in an extremely slow lookup time: 34s to resolve these "globs."

I see two issues:

  1. hpack could output a glob syntax that Cabal understands if the cabal version is new enough. This would at least reduce the
  2. Cabal needs to not do file glob expansion logic on a line if it isn't a glob.

I've opened an issue to track this in upstream cabal: cabal #10495. Apparently we're also repeating this four times, so I can probably quarter the time spent in glob lookups at least just by avoiding redundant work.

m4lvin commented 3 days ago

I think the same applies to extra-doc-files and possibly other fields where globs are allowed?

sol commented 4 hours ago

@parsonsmatt this sounds bad. Did you managed to find a workaround for now? I think you could use verbatim to "propagate" extra-source-files with globs into your Cabal file (please feel free to ask, if this is not clear somehow).

hpack could output a glob syntax that Cabal understands if the cabal version is new enough.

Yes, that would also have the advantage that you don't need to regenerate the Cabal file when more extra source files are added.

Somebody will need to figure out:

  1. Is the glob syntax compatible?
  2. If not, is transpilation feasible?
parsonsmatt commented 3 hours ago

Yeah, I managed to fix the worst impact of the performance in Cabal #10502 by skipping the glob logic if a path doesn't have any glob characters.

I suspect the remaining glob code is still extremely slow, and glob expansion is going to be a pain point if we do try and push the globs into cabal - it's a cost paid once per cabal invocation in the check phase, vs using hpack makes it a cost paid once per hpack invocation.

Cabal's glob syntax seems pretty limited - a * or ** has to be a literal segment, so you can't write foo/bar*.hs.

I think we don't need to do anything here; evading Cabal's glob logic seems to be the winning move.