merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
440 stars 145 forks source link

Contigs DB Issue #521

Closed adityabandla closed 7 years ago

adityabandla commented 7 years ago

Hi

Using anvio v2.3.1 (stable release), I generated the contigs database. The job completed without errors. The size of the db and h5 files were 13G/1.5G. All subsequent anvi steps went without a glitch (anvi-run-hmms etc), until I ran into the C/R issue, raised earlier #516

However, to verify if this was an issue with the contigs database, I rebuilt the db on another server, but this time I ended up with db and h5 files of the size 15G/9G respectively

The number of contigs, splits etc across both the runs were the same, based on the output log

I am not sure if this is an anvio issue or could be caused by some broken dependency

Regards, Aditya

meren commented 7 years ago

Hey Aditya,

I can't see how this could be an anvi'o issue :( Are you running these in a cluster system? Are you 100% sure your processes are finishing properly, and not getting killed due to memory or CPU quotas?

adityabandla commented 7 years ago

Hi Meren

Yes, initially I generated the contigs db on a cluster. The job exited with status 0, way before the walltime finished. It did not get killed midway as well due to either memory or quota etc.

The second time, I ran it on a standalone server.

But in case, the db generation ran into issues, say in case of the cluster, is there a way to sanity check this, before proceeding with downstream steps?

When I ran the debug lines you had suggested earlier, there seemed to be no obvious error for i in /contigs.db; do echo $i; sqlite3 $i 'select from self;'; echo; done

The confusing part was that, the summary from both files were the same. Only their file sizes were different

meren commented 7 years ago

Hmm. We don't have a proper sanity check to see whether all jobs that run from within anvi-gen-contigs-db are completed properly.

anvi-gen-contigs-db is running Prodigal, and then HMMER to take care of some other things you can imagine. They may be getting killed, and contigs class may be continuing operations with whatever it got back.

But currently I can't think of a robust way to sanity-check those :/