greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Simplify pipeline code #130

Closed dhimmel closed 6 years ago

dhimmel commented 6 years ago

@zietzm I'm creating an archive using the following:

from hetmech.hetmat import archive
import zipfile
dst = 'GpBPpGiG.zip'
paths = [
   'adjusted-path-counts/dwpc-0.5/degree-grouped-permutations/GpBPpGiG.tsv.gz',
]

globs = [
   'path-counts/**/GpBPpGiG.*',
]
archive.create_archive_by_globs(dst, 'data/hetionet-v1.0.hetmat', include_globs=globs, include_paths=paths, compression=zipfile.ZIP_STORED)

Here's the contents of GpBPpGiG.zip-info.tsv:

CRC archive compress_size compress_type file_size filename
1914199646 GpBPpGiG.zip 2589990 store 2589990 adjusted-path-counts/dwpc-0.5/degree-grouped-permutations/GpBPpGiG.tsv.gz
4002283222 GpBPpGiG.zip 543814234 store 543814234 path-counts/dwpc-0.0/GpBPpGiG.sparse.npz
1920057425 GpBPpGiG.zip 1401515825 store 1401515825 path-counts/dwpc-0.5/GpBPpGiG.sparse.npz

Download GpBPpGiG.zip on Dropbox.