openmm / openmmforcefields

CHARMM and AMBER forcefields for OpenMM (with small molecule support)
http://openmm.org
Other
258 stars 81 forks source link

Extend ForceField to read compressed `ffxml` files? #5

Open jchodera opened 7 years ago

jchodera commented 7 years ago

To save storage space, it may be useful to extend ForceField to be able to read .gz compressed ffxml files.

jchodera commented 7 years ago

@peastman : Given the size of the CHARMM ffxml file (many MB), what would you think of supporting automatic handling of gzipped ffxml files?

I wonder if we may even want to support compressed serialized System XML files at the C++ layer too. These files can be enormous.

peastman commented 7 years ago

what would you think of supporting automatic handling of gzipped ffxml files?

What would be the benefit of that? It would save a little disk space, but on the scale of modern disks, that's irrelevant. And it would have no effect on download time when installing it, because git and conda both already transfer data in compressed form.

jchodera commented 7 years ago

What would be the benefit of that? It would save a little disk space, but on the scale of modern disks, that's irrelevant. And it would have no effect on download time when installing it, because git and conda both already transfer data in compressed form.

For state.xml and system.xml files, for Folding@home, we're talking about many, many TB of data on our FAH servers and a huge amount of network traffic and download/upload times. For example, for project 10496, the state.xml file is 44MB uncompressed, 15MB compressed. That project has 1,930,000 of those files, so this makes a difference of (1.93e6)(44-15)M = 55 TB of data.

For the ffxml file, it's less of a concern, but the compatible patch lists does inflate things quite a bit for CHARMM, weighing in around 10MB in that initial case.

There's essentially no drawback to compressing these files---it's very fast and takes little CPU time---but once you have more than a few of these files hanging around, the space savings adds up quite a bit.