Open ndmitchell opened 10 years ago
I don't think the local storage is the big issue (it doesn't need to be kept uncompressed afterall). The big issue is the transfer cost, and we should get on with the project to do incremental index updates.
Hoogle does need to keep it uncompressed, plus extraction time is very long. In my benchmarks downloading from the network is pretty cheap - less than 10% of the download and extract time. These measurements are on windows, where creating a large number of directories is expensive.
@ndmitchell My advice on the performance front is to uncompress but keep it as a .tar, ie don't bother extracting it (as you say, directories are slow even on non-windows). This is what cabal-install does.
If you need random access, we have a couple solutions for that:
tar
package itself).@dcoutts I need the latest version of the cabal file for every package, so it's far more than random access - I end up needing 5000 files or so out of it.
Currently packages/index.tar.gz has every version of every .cabal file ever. It's currently > 100Mb when uncompressed. Some clients are only interested in the most version of each package, so having a latest.tar.gz containing only the latest versions of each package might be useful. Certainly Hoogle would switch, resulting in > 200Mb disk space saving on each machine generating Hoogle databases.