Closed scotty323 closed 7 years ago
Hello @Tao,
Thanks for using AGOUTI.
AGOUTI is yet to support to read multiple BAM files simultaneously. I guess you have to first cat your BAM into a single one.
Also can I ask what size of each of your BAMs?
Simo
BAM files are 3G on average, 12 bam files. How can I increase the thread number and memory to run the merged bam file?
2017-04-11 09:41:28,500 - INFO - AGOUTI_DENOISE PROGRESS - [BEGIN] Denoising joining pairs 2017-04-11 09:41:49,176 - INFO - AGOUTI_DENOISE PROGRESS - Succeeded 2017-04-11 09:41:49,177 - INFO - AGOUTI_DENOISE PROGRESS - Denoise took in 0.34 min CPU time 2017-04-11 09:41:49,177 - INFO - AGOUTI_DENOISE PROGRESS - 613 contig pairs filtered for spanning across >1 gene models 2017-04-11 09:41:49,177 - INFO - AGOUTI_DENOISE PROGRESS - 39 contig pairs filtered for not being one of the four combinations 2017-04-11 09:41:49,177 - INFO - AGOUTI_DENOISE PROGRESS - 1526 contig pairs filtered for less support 2017-04-11 09:41:49,177 - INFO - AGOUTI_DENOISE PROGRESS - 9 contig pairs for scaffolding 2017-04-11 09:41:49,178 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Building graph from joining reads pairs 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Build graph took 0.0000 min CPU time 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - 16 vertices in the graph 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Simplifying graph 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - 0 Edges removed due to insufficient supports 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Start graph walk 2017-04-11 09:41:49,179 - INFO - AGOUTI_SCAFFOLDING PROGRESS - number of visited nodes: 16 2017-04-11 09:41:49,180 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Scaffolding took 0.0000 min CPU time 2017-04-11 09:41:49,180 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Graph Reconciliation 2017-04-11 09:41:49,180 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Reconciliation took 0.0000 min CPU time 2017-04-11 09:41:49,180 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Report scaffolding paths 2017-04-11 09:41:49,181 - INFO - AGOUTI_SCAFFOLDING PROGRESS - Visualize graph in DOT 2017-04-11 09:41:49,204 - INFO - AGOUTI_UPDATE PROGRESS - [BEGIN] Updating gene models 2017-04-11 09:41:49,247 - INFO - AGOUTI_UPDATE PROGRESS - Finalizing sequences 2017-04-11 09:41:55,456 - INFO - AGOUTI_UPDATE PROGRESS - Outputting updated Gene Moddels 2017-04-11 09:41:56,316 - INFO - AGOUTI_UPDATE PROGRESS - Summarizing AGOUTI gene paths 2017-04-11 09:41:56,317 - INFO - AGOUTI_UPDATE PROGRESS - -----------Summary----------- 2017-04-11 09:41:56,317 - INFO - AGOUTI_UPDATE PROGRESS - number of contigs scaffoled: 15 2017-04-11 09:41:56,317 - INFO - AGOUTI_UPDATE PROGRESS - number of scaffolds: 7 2017-04-11 09:41:56,317 - INFO - AGOUTI_UPDATE PROGRESS - number of contigs in the final assembly: 3326 2017-04-11 09:41:56,318 - INFO - AGOUTI_UPDATE PROGRESS - Final assembly N50: 60718603 2017-04-11 09:41:56,318 - INFO - AGOUTI_UPDATE PROGRESS - Final number of genes: 26688 2017-04-11 09:41:56,318 - INFO - AGOUTI_UPDATE PROGRESS - Succeeded 2017-04-11 09:41:56,318 - INFO - PARSE_ARGS PROGRESS - Peak memory use: 1.00000 GB
AGOUTI currently can only use single thread for reading. As for memory, can I ask what species is this? And are you running on your local computer or a cluster?
The species is sacred lotus, and the assembled draft genome (with genetic map) is about 1 G.
It is a desktop server:
[lzc@localhost ~]$ free -lh total used free shared buff/cache available Mem: 251G 1.9G 93G 326M 156G 249G Low: 251G 158G 93G High: 0B 0B 0B Swap: 4.0G 191M 3.8G
Other information about the server:
top - 21:37:19 up 1 day, 17:25, 2 users, load average: 0.00, 0.04, 0.05 Tasks: 568 total, 1 running, 489 sleeping, 0 stopped, 78 zombie %Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 26392755+total, 97769840 free, 2010016 used, 16414768+buff/cache KiB Swap: 4194300 total, 3998236 free, 196064 used. 26111033+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19514 lzc 20 0 158232 2672 1552 R 1.0 0.0 0:00.19 top 78 root 20 0 0 0 0 S 0.3 0.0 0:14.63 rcuos/19 1 root 20 0 196232 7792 2396 S 0.0 0.0 1:36.35 systemd 2 root 20 0 0 0 0 S 0.0 0.0 2:08.29 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 4:39.81 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0.0 0.0 0:30.80 kworker/u96:0 8 root rt 0 0 0 0 S 0.0 0.0 1:25.26 migration/0 9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
Another question, how to make sure the paired-end reads connetions are not from transposable elements or other repeats?
So for each connection between contigs, AGOUTI only use uniquely mapped paired-end reads?
I believe your desktop server is more than capable for running AGOUTI on a BAM file of >= 36 GB.
Could you please go ahead giving it a try? If it runs very slow, let me see if I can come up with a quick patch to support reading BAMs in parallel.
Reads from repetitive parts of a genome are not expected to be mapped uniquely. Even if they are, I think you could use mapping quality to control them. I haven't yet particularly looked at this, to be honest.
Currently yet. Only uniquely mapped paired-end reads are allowed.
Ok. thanks!
Hello @scotty323,
I have implemented a beta version of AGOUTI that can take multiple BAM files. You can use -t argument to specify how many number of BAM files you want to read at the same time. Reading each BAM invokes one process for samtools, and one for the AGOUTI worker that reads the BAM. You can simply provide multiple BAM files after argument -bam, and each file is separated by one space.
Could you please pull down the "multibam" branch and give a try? Let me know how it goes.
Simo
Close it for now. Reopen if issue persists
Hi Zhang,
I am able to use AGOUTI with one bam file. How can I use multiple bam files simontaneously?
Tao