vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 192 forks source link

vg index multithreading #2371

Open manuelsmendoza opened 4 years ago

manuelsmendoza commented 4 years ago

Hi,

Firstly, I've built a pangenome from 14 genomes. I've used our genome of reference to align the reads and extract the .fasta file from alignment resulting .bam combining these 14 genomes (.fasta) and the reference I've built the graph.

Now, I'm trying to index the graph using vg index with -t 36 flag for multithreading. It's running for 165 hours using only one thread... Is there any problem with vg index multithreading?

Thanks in advance, ~MM.

ekg commented 4 years ago

What index command are you running?

On Mon, Jul 29, 2019 at 3:41 PM Manuel Mendoza notifications@github.com wrote:

Hi,

Firstly, I've built a pangenome from 14 genomes. I've used our genome of reference to align the reads and extract the .fasta file from alignment resulting .bam combining these 14 genomes (.fasta) and the reference I've built the graph.

Now, I'm trying to index the graph using vg index with -t 36 flag for multithreading. It's running for 165 hours using only one thread... Is there any problem with vg index multithreading?

Thanks in advance, ~MM.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2371?email_source=notifications&email_token=AABDQENVEWRFAGHLI7NFYZTQB3XOXA5CNFSM4IHSRZ4KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HCBRZQA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEICH4J7ZINJNRXSFBLQB3XOXANCNFSM4IHSRZ4A .

manuelsmendoza commented 4 years ago

vg index -t 36 -x cgi.xg -g cgi.gcsa cgi.vg

ekg commented 4 years ago

Take a look at the wiki, specifically https://github.com/vgteam/vg/wiki/Index-Construction#indexing-a-large-graph.

If you graph isn't big, you might still want to apply vg prune to reduce the complexity of the kmer space.

I would split up the xg and gcsa indexing.

Also add -p to the command line to see what's going on.

Sorry that this has run for a week and now you'll probably need to restart it.