merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
427 stars 145 forks source link

anvi-profile hangs #448

Closed jmeppley closed 7 years ago

jmeppley commented 7 years ago

When working with a large dataset using the (new profiling branch)[/merenlab/anvio/tree/new-profiling], anvi-profile gets thruogh a few contigs quickly and then stops progressing. The system reports no CPU usage and no more output is produced.

Is there a way to figure out what contig the process is hung on? I'm trying to find the minimum data set for which this happens.

I don't seem to be having the same problem with version 2.1.0 or with the master branch. My test runs haven't finished, but they are continuing to progress...

If you are curious, here are the output and other bits of related information for two examples. they shows that the process ran for a few minutes and then stopped cold:

(anvio3) [jmeppley@prod2-0244 all_reads]$ ls -lrth anvio/profile-HOT229_1_0200m/
total 56K
-rw-rw-r-- 1 jmeppley jmeppley 2.1K Jan 27 01:22 RUNLOG.txt
-rw-r--r-- 1 jmeppley jmeppley  48K Jan 27 01:22 PROFILE.db

(anvio3) [jmeppley@prod2-0244 all_reads]$ tail -n 3 logs/anvio-profile-HOT229_1_0200m
num_splits .........................: 142,332
total_length .......................: 802,056,841
[27 Jan 17 01:22:20 Profiling using 1 threads] Processed 29 of 140889 contigs. Current memory usage: 5.31 GB

(anvio3) [jmeppley@prod2-0244 all_reads]$ ps aux | egrep "(USER|bin/anvi-prof)"
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jmeppley 147006  0.0  1.0 3651884 2758564 pts/3 Sl   Jan27   3:02 /home/jmeppley/apps/anaconda3/envs/anvio3/bin/python /home/jmeppley/apps/anaconda3/envs/anvio3/bin/anvi-profile -i mapping/HOT229_1_0200m.reads.vs.contigs.bam -c anvio/contigs.db -S HOT229_1_0200m -o anvio/profile-HOT229_1_0200m --min-contig-length 2500
jmeppley 147031  0.0  1.0 3717164 2803880 pts/3 S    Jan27   0:05 /home/jmeppley/apps/anaconda3/envs/anvio3/bin/python /home/jmeppley/apps/anaconda3/envs/anvio3/bin/anvi-profile -i mapping/HOT229_1_0200m.reads.vs.contigs.bam -c anvio/contigs.db -S HOT229_1_0200m -o anvio/profile-HOT229_1_0200m --min-contig-length 2500

(anvio3) [jmeppley@prod2-0244 all_reads]$ date
Mon Jan 30 17:23:11 UTC 2017

(anvio3) [jmeppley@prod2-0244 all_reads]$ ls -lrth logs | grep profile | grep 229
-rw-rw-r--  1 jmeppley jmeppley  19K Jan 27 01:22 anvio-profile-HOT229_1_0200m
(anvio3) [jmeppley@prod2-0244 all_reads]$ ls -lrth anvio/profile-HOT233_1c_0200m/
total 37M
-rw-rw-r-- 1 jmeppley jmeppley 2.1K Jan 27 01:13 RUNLOG.txt
-rw-r--r-- 1 jmeppley jmeppley 1.8M Jan 27 01:22 PROFILE.db
-rw-rw-r-- 1 jmeppley jmeppley  35M Jan 27 01:22 AUXILIARY-DATA.h5

(anvio3) [jmeppley@prod2-0244 all_reads]$ tail -n 3 logs/anvio-profile-HOT233*
num_splits .........................: 142,332
total_length .......................: 802,056,841
[27 Jan 17 01:13:01 Profiling using 1 threads] Processed 2499 of 140889 contigs. Current memory usage: 8.43 GB                                                      

(anvio3) [jmeppley@prod2-0244 all_reads]$ ps aux | egrep "(USER|bin/anvi-prof)" | egrep "(USER|233)"
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jmeppley 146593  0.0  1.6 5254160 4398488 pts/3 Sl   Jan27   3:44 /home/jmeppley/apps/anaconda3/envs/anvio3/bin/python /home/jmeppley/apps/anaconda3/envs/anvio3/bin/anvi-profile -i mapping/HOT233_1c_0200m.reads.vs.contigs.bam -c anvio/contigs.db -S HOT233_1c_0200m -o anvio/profile-HOT233_1c_0200m --min-contig-length 2500
jmeppley 147032  0.0  1.6 5254416 4376540 pts/3 S    Jan27   0:00 /home/jmeppley/apps/anaconda3/envs/anvio3/bin/python /home/jmeppley/apps/anaconda3/envs/anvio3/bin/anvi-profile -i mapping/HOT233_1c_0200m.reads.vs.contigs.bam -c anvio/contigs.db -S HOT233_1c_0200m -o anvio/profile-HOT233_1c_0200m --min-contig-length 2500

(anvio3) [jmeppley@prod2-0244 all_reads]$ date
Mon Jan 30 17:25:03 UTC 2017

(anvio3) [jmeppley@prod2-0244 all_reads]$ ls -lrth logs | grep profile | grep 233
-rw-rw-r--  1 jmeppley jmeppley 858K Jan 27 01:22 anvio-profile-HOT233_1c_0200m
ozcan commented 7 years ago

Hello John,

This might be related race condition we fixed last week, https://github.com/merenlab/anvio/commit/33d2f66ed555bcdfe6a86803f634b1178374c23b

Also we made another changes to stabilize memory usage for big datasets, can you try again after pulling the latest changes? sorry for the inconvenience.

Thank you very much,

jmeppley commented 7 years ago

No apologies necessary. I will pull the latest and try again...

jmeppley commented 7 years ago

Of the 11 samples I was seeing the problem on, 9 of 11 have completed with the latest code. The other two are about halfway done, but still working.

I will close this issue when the other two complete (probably in another day).

meren commented 7 years ago

Thank you very much for testing, John! I'm very happy we are not the only ones doing it when we are changing such a critical part of the workflow!