haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
416 stars 198 forks source link

Add an index of just the most recent version of each package #174

Open ndmitchell opened 10 years ago

ndmitchell commented 10 years ago

Currently packages/index.tar.gz has every version of every .cabal file ever. It's currently > 100Mb when uncompressed. Some clients are only interested in the most version of each package, so having a latest.tar.gz containing only the latest versions of each package might be useful. Certainly Hoogle would switch, resulting in > 200Mb disk space saving on each machine generating Hoogle databases.

dcoutts commented 10 years ago

I don't think the local storage is the big issue (it doesn't need to be kept uncompressed afterall). The big issue is the transfer cost, and we should get on with the project to do incremental index updates.

ndmitchell commented 10 years ago

Hoogle does need to keep it uncompressed, plus extraction time is very long. In my benchmarks downloading from the network is pretty cheap - less than 10% of the download and extract time. These measurements are on windows, where creating a large number of directories is expensive.

dcoutts commented 10 years ago

@ndmitchell My advice on the performance front is to uncompress but keep it as a .tar, ie don't bother extracting it (as you say, directories are slow even on non-windows). This is what cabal-install does.

If you need random access, we have a couple solutions for that:

  1. the hackage-server has code for creating indexes of tar files (which I ought to move into the tar package itself).
  2. cabal-install makes another kind of index, of package id -> offset in the tar file, and provides a PackageIndex type with lazy loading of the package descriptions. (Yes, this could be moved into a lib too)
ndmitchell commented 10 years ago

@dcoutts I need the latest version of the cabal file for every package, so it's far more than random access - I end up needing 5000 files or so out of it.