micom-dev / micom

Python package to study microbial communities using metabolic modeling.
https://micom-dev.github.io/micom
Apache License 2.0
82 stars 17 forks source link

Give the option to increase the compresslvl of model databases #37

Closed nigiord closed 2 years ago

nigiord commented 3 years ago

Checklist

Is your feature related to a problem? Please describe it.

Not really a problem but I noticed that the workflows.build_database() function does not really compress the output. This is the current state of the code in workflows.build_database().

            with ZipFile(out_path, "w") as zf:
                [zf.write(a[2], os.path.basename(a[2])) for a in args]
                zf.write(os.path.join(tdir, "manifest.csv"), "manifest.csv")

But if we look at the ZipFile() documentation, we can see that by default the compression level is 0 (no compression).

Init signature:
zipfile.ZipFile(
    file,
    mode='r',
    compression=0,
    allowZip64=True,
    compresslevel=None,
)
Docstring:     
Class with methods to open, read, write, close, list zip files.

z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True,
            compresslevel=None)

file: Either the path to the file, or a file-like object.
      If it is a path, the file will be opened and closed by ZipFile.
mode: The mode can be either read 'r', write 'w', exclusive create 'x',
      or append 'a'.
compression: ZIP_STORED (no compression), ZIP_DEFLATED (requires zlib),
             ZIP_BZIP2 (requires bz2) or ZIP_LZMA (requires lzma).
allowZip64: if True ZipFile will create files with ZIP64 extensions when
            needed, otherwise it will raise an exception when this would
            be necessary.
compresslevel: None (default for the given compression type) or an integer
               specifying the level to pass to the compressor.
               When using ZIP_STORED or ZIP_LZMA this keyword has no effect.
               When using ZIP_DEFLATED integers 0 through 9 are accepted.
               When using ZIP_BZIP2 integers 1 through 9 are accepted.
Init docstring:
Open the ZIP file with mode read 'r', write 'w', exclusive create 'x',
or append 'a'.

I think this cancels out the advantage of using a .zip file.

Describe the solution you would like.

It would be nice if we had the option to choose a compression level when building the model database. This should not interfere with the reading in other parts of the code since the same module (zipfile) is used.

cdiener commented 3 years ago

Yes that makes sense. Feel free to send a PR with that change if you'd like.