metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

clean up #662

Closed slambrechts closed 1 year ago

slambrechts commented 1 year ago

Hi Silas,

I have an unfinished project for which I ran the complete atlas pipeline, but haven't finished downstream analysis of the output yet. Now I have to remove files in order to free up space on the cluster. Any advice on what files to keep and which ones are intermediate files?

I haven't used the genecatalog output yet, but I assume the alignments are summarized in counts and I can remove the bam files?

And what about the folders per sample in the main directory? I have used the complete MAG set across all samples and used the DRAM output, but haven't used those folders yet.

Cheers, Sam

SilasK commented 1 year ago

It is true that Atlas creates a lot of intermediate files.

In future I will try to change this and put everything in the intermediate folder.

Make shure you ran atlas to the end. In the latest version I added optional dram genecatalog annotation.

You should definitely keep

Generally, keep the Output folders, stats, reports,Genecatalog, and genomes.

Yes you can remove bam files, and I think dram also creates some heavy intermediate files.

You can gzip all assemblies and the Genecatalog.

If this is of interest, There should be a file, all contigs2bins which should allow you to recreate all intermediate bins.

Make shure you have the stats/read_counts.tsv necessary to calculate mapping rate..

slambrechts commented 1 year ago

Hi Silas,

Thank you for your answer! If I would like to run dram genecatalog in the future on this project, do I need to keep the bam files in the Genecatalog/alignments folder?

SilasK commented 1 year ago

Not if you have the counts already. The format changed a little in the recent version. but if you have the counts and coverage in the counts folder.

Why not running the dram annotation now?

Note: If you want to run an analysis in future, atlas might complain that some of the intermediate files are missing or that they were updated.

I suggest you to do always a dryrun. Then you can also trigger the creation of specific targets via this command

atlas run None <target_file>

e.g.

atlas run None Genecatalog/annotations/dram