pmelsted / bifrost

Bifrost: Highly parallel construction and indexing of colored and compacted de Bruijn graphs
BSD 2-Clause "Simplified" License
204 stars 25 forks source link

Indexing 200K assemblies #63

Closed davidmaimoun closed 2 years ago

davidmaimoun commented 2 years ago

Hi,

I have 200k assemblies, can I indexing all of them with Bifrost ?

I ran on few of them to test it: Bifrost build -t 4 -k 31 -i -d -s fastas.txt -c -o a_graph

but the build with the paths in .txt format doesn't work, and it will be impossible to add them one by one with the command -s

Could you help me please ?

GuillaumeHolley commented 2 years ago

Hi @davidmaimoun,

Thanks for reaching out. 200k assemblies should be feasible indeed, given that you have enough RAM. What's the size of each assemblies? Are they from the same organism? By the way, if your input are assemblies and not reads, you want to use -r fastas.txt and not -s fastas.txt.

Now, when you build with the list of assemblies as input, what error do you get? Can you copy the Bifrost log output your get in your terminal in here?

Guillaume

GuillaumeHolley commented 2 years ago

Hi @davidmaimoun,

Did you have the time to look into this?

Guillaume

davidmaimoun commented 2 years ago

Sorry Guillaume I wasn't at work few days.

I launched it on 1000 assemblies this morning and it worked very well thank you! With the command -r I was able to list all my assemblies, and the program did the rest.

Thank you very much for your help

GuillaumeHolley commented 2 years ago

Glad to hear it :)