Very very large .m8 temporary files

metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data

GNU General Public License v3.0

21 stars 12 forks source link

Very very large .m8 temporary files #42

Closed theo-allnutt-bioinformatics closed 5 years ago

theo-allnutt-bioinformatics commented 5 years ago

I am running superfocus on 96 samples with ~20GB read files each. As it is running, each sample gives a ~60Gb .m8 file, so space is becoming an issue. Is three any way to alter this behaviour? There is no way I can store 96 x 60Gb files before the run has finished.

Thanks,

metageni commented 5 years ago

@theo-allnutt-bioinformatics Bummer!

I created a PR with a solution for it here (https://github.com/metageni/SUPER-FOCUS/pull/43)

The quick solution deletes these large files.

just add -d when you run superfocus. it should delete these large files after every alignment is done and data was parsed.

You probably want to keep the database and not have to download it.

Let me know if you have any problems cloning the new version etc. I will merge it into master once I address other suggestions that people have asked.

Best

metageni commented 5 years ago

I actually merged it into master, but no version was released it.

theo-allnutt-bioinformatics commented 5 years ago

Thanks very much. I will try it on my next run. Does this mean that I can delete the alignment files for completed samples while it is still running?

metageni commented 5 years ago

Yes, once the alignment is done and the file is parsed, it can be deleted. So that is what the -d will do. I left the original alignments because some people like having it.

So if you have 90 samples, alignment of sample 1 will be deleted before the alignment of sample 2 start, and parsed results will be stored in the dict.

theo-allnutt-bioinformatics commented 5 years ago

Great, thanks for your quick reples.