Open dweemx opened 4 years ago
Hey @dweemx ,
I think these steps will all need to be implemented within the pySCENIC code (CLI) itself. Otherwise the pipeline has the overhead of zipping and unzipping these files. It shouldn't be too much work, I could probably get to it next week.
The final point, about AUCell outputting a full loom with expression data is probably the most important. We can change the AUCell output to a tsv matrix, which would avoid writing the expression data again. But this might require significant changes to the multi-runs section of the pipeline.
Hi @cflerin ,
I was not going to implement the compression here since it's already available by pySCENIC
. It just a matter of properly calling pySCENIC
.
Regarding AUCell, yes I had in mind exactly what you suggested
No, it won't work, I think, unless I'm totally misunderstanding what you're planning. pySCENIC CLI wont recognize files with gz ending, even though pandas is capable of reading them:
pyscenic.cli.pyscenic - ERROR - Unknown file format for "adj.tsv.gz".
Likewise, if you want AUCell to have a tsv(.gz) output with loom input, it will require (minor) changes to the code.
You're right now I see it in the code, this piece of code should be change I guess:
https://github.com/aertslab/pySCENIC/blob/d3120aff123c87c1dbeb5b5e1030cb9a4210836d/src/pyscenic/cli/utils.py#L196-L198
Because, pandas
allows to load files .tsv.gz
, ... like you said
Hi @dweemx ,
I've patched pySCENIC to enable (optional) file compression for all intermediate files (included in new release 0.10.1
, and a new Docker image is available). Now, if you give a .gz
ending in the filename argument when calling pySCENIC (adj.tsv.gz
, reg.csv.gz
), the file will be read/written as a compressed file.
I've started a branch feature/28-add_intermediate_file_compression
with some basic steps toward this.
Merged this partially completed feature in 63eb95286b28729983d7a3b9234887381dd7dec9.
This is critical especially for the SCENIC multi-runs pipelines. Currently the output from the GRN step save the network as .tsv file. This should be compressed as a
.tsv.gz
file instead. The cisTarget step save the motifs enrichment table as .tsv file. This should be compressed as.tsv.gz
file This AUCell step save the Regulons AUC matrix as a loom file along the expression matrix. This should be saved compressed.tsv.gz
file or loom without containing the expression matrix