kbaseattic / assembly

An extensible framework for genome assembly.
MIT License
12 stars 14 forks source link

implement MEGAHIT #244

Closed levinas closed 9 years ago

sebhtml commented 9 years ago

root@kbase-devel:/kbase/arast/assembly# git log --oneline|grep megahit 7bcc026 megahit: normalize memory limit using the total thread count af59dd0 plugins: remove foo=bar option for megahit 9b39c30 plugins: add megahit plugin e4600be plugins: add megahit configuration file 8c5061c Remove megahit version and release numbers 1c13a6e tools: add megahit package

levinas commented 9 years ago

The test completed successfully. I'm reopening the issue to make sure the logging is correct.

Here's the megahit.out file I see in the work directory (note that the --input-cmd parameter contains only one file):

Command: megahit --cpu-only --num-cpu-threads 4 -l 512 -m 131448646144 --input-cmd p1.fq -o megahit
MEGAHIT v0.2.0
[Thu Feb 12 16:25:56 2015] Start assembly. Number of CPU threads 4.
[Thu Feb 12 16:25:56 2015] Extracting solid (k+1)-mers for k = 21
[Thu Feb 12 16:26:01 2015] Building graph for k = 21
[Thu Feb 12 16:26:05 2015] Assembling contigs from SdBG for k = 21
[Thu Feb 12 16:26:11 2015] Extracting iterative edges from k = 21 to 31
[Thu Feb 12 16:26:12 2015] Building graph for k = 31
[Thu Feb 12 16:26:13 2015] Assembling contigs from SdBG for k = 31
[Thu Feb 12 16:26:14 2015] Extracting iterative edges from k = 31 to 41
[Thu Feb 12 16:26:14 2015] Building graph for k = 41
[Thu Feb 12 16:26:14 2015] Assembling contigs from SdBG for k = 41
[Thu Feb 12 16:26:14 2015] Extracting iterative edges from k = 41 to 51
[Thu Feb 12 16:26:15 2015] Building graph for k = 51
[Thu Feb 12 16:26:15 2015] Assembling contigs from SdBG for k = 51
[Thu Feb 12 16:26:15 2015] Extracting iterative edges from k = 51 to 61
[Thu Feb 12 16:26:15 2015] Merging to output final contigs.
[Thu Feb 12 16:26:15 2015] ALL DONE.

The actual command seems correct:

['/home/ubuntu/assembly/third_party/megahit/megahit', '--cpu-only', '--num-cpu-threads', '4', '-l', '512', '-m', '131448646144', '--input-cmd', u'cat /mnt/data/fang
fang/119/raw/p2.fq /mnt/data/fangfang/119/raw/p1.fq', '-o', u'/mnt/data/fangfang/119/107/megahit_fff2ac6b-67d3-42d6-b160-d4ca6b04a211/megahit']
sebhtml commented 9 years ago

The argument in the list after "input-cmd" is basically "cat " + " ".join(files).

Do you want me to test with more than 1 file before closing the issue ?

levinas commented 9 years ago

I think the command is probably passed correctly. Can you look into how to get the "Command: megabit..." line in the output file to reflect the "cat .." subcommand correctly?

sebhtml commented 9 years ago

This can be done by printing the command from within the megahit assembly service plugin.

levinas commented 9 years ago

MEGAHIT is now part of dev recipes and testing. The issue will be closed when Seb completes the test on big files.

sebhtml commented 9 years ago

I tested with a pair of files.

seb@kbase-devel:~/bug-167/job-80$ arast get -j 80 File downloaded: 80_1.megahit_contigs.fa File downloaded: 80_report.txt File downloaded: 80_analysis.tar.gz HTML extracted: 80_analysis/report.html

seb@kbase-devel:~/bug-167/job-80$ ls -lh total 2.4M -rw-rw-r-- 1 seb seb 2.3M Feb 12 23:08 80_1.megahit_contigs.fa drwxrwxr-x 5 seb seb 4.0K Feb 12 23:08 80_analysis -rw-rw-r-- 1 seb seb 6.3K Feb 12 23:08 80_report.txt

seb@kbase-devel:~/bug-167/job-80$ head 80_report.txt All statistics are based on contigs of size >= 500 bp, unless otherwise noted (e.g., "# contigs (>= 0 bp)" and "Total length (>= 0 bp)" include all contigs).

Assembly megahit_contigs

contigs (>= 0 bp) 2003

contigs (>= 1000 bp) 77

Total length (>= 0 bp) 2300567
Total length (>= 1000 bp) 1807716

contigs 115

Largest contig 132002
Total length 1831638

seb@kbase-devel:~/bug-167/job-80$ arast stat -d|grep 80 | 80 | 4 | Complete | 1:20:47 | None | -p megahit |