medvedevgroup / SibeliaZ

A fast whole-genome aligner based on de Bruijn graphs
http://medvedevgroup.com/
Other
141 stars 19 forks source link

Reduce number of files created? #21

Open jeremylp2 opened 3 years ago

jeremylp2 commented 3 years ago

Hi,

I've noticed that Sibeliaz creates a very large number of small files in the "alignment" folder while running. I saw about 1.7 million files generated for 3 genomes (sizes 400, 500 and 1000 Mb). Is there any way to reduce this footprint, even if it reduces speed significantly? I'm working on a system with an inode quota, so for larger alignments I'm likely to be unable to run Sibeliaz, even though the files only need to be around temporarily.

iminkin commented 3 years ago

Hi,

I envisioned this issue coming up at some point. I think there should be an easy fix, I will try to come up with a solution around next week.

jeremylp2 commented 3 years ago

Great, thanks for the quick response!

zfuller5280 commented 3 years ago

Has this issue been resolved? We are running into a similar problem with >1 million intermediate files generated and getting inode quota errors. Thanks!

iminkin commented 3 years ago

Has this issue been resolved? We are running into a similar problem with >1 million intermediate files generated and getting inode quota errors. Thanks!

In progress, stay tuned.

iminkin commented 3 years ago

Hi all,

Sorry for the delay. Global pandemic corrected some of my plans, but I will try to maintain SibeliaZ and keep it going. I pushed a potential solution here, branch "no_block_files": https://github.com/medvedevgroup/SibeliaZ/tree/no_block_files

Please check it out and let me know if it fixes the issue you are facing. It is not a release (yet), but changes are minimal compared to 1.2.2.

iminkin commented 3 years ago

Incorporated in https://github.com/medvedevgroup/SibeliaZ/releases/tag/v1.2.3