lh3 / minigraph

Sequence-to-graph mapper and graph generator
https://lh3.github.io/minigraph
MIT License
419 stars 38 forks source link

Does minigraph support mixed-case sequence? #38

Closed egoltsman closed 3 years ago

egoltsman commented 3 years ago

Hello Heng, My minigraph run is crashing on the construction of a 2-genome graph. In fact, it's my server that's killing it, which suggests a memory overrun. I've been able to build much larger graphs in the past (i.e. larger in terms of total sequence length), so I'm scratching my head here. The only odd thing about the sequence is that it has mixed case letters, so I wanted to double check that the program supports this.
Thanks!

lh3 commented 3 years ago

Minigraph ignores letter cases, so this should not be the cause. Minigraph may take a lot for memory for repeat-rich genomes.

egoltsman commented 3 years ago

This is, if fact, a large repeat-rich plant genome. It's assembled into complete chromosomes, one for each haplotype, so I was going to build a separate graph for each chromosome and then merge the graphs. In order to merge rGFAs, is it enough to just give the segments unique ids, or is there anything else to watch out for? For example, is it OK to keep the SN and SO tags redundant for the different chromosomes/scaffolds?

lh3 commented 3 years ago

It is preferred to build a graph in one go. You can use a smaller -U to reduce the memory.

egoltsman commented 3 years ago

Thanks, do you have a recommendation on a reasonable value to try here? I tried it with -U 25,100 and again an out of memory on a 1TB machine.

lh3 commented 3 years ago

What is the size of the genome?

egoltsman commented 3 years ago

Ah, my mistake. It didn't run out of memory. It actually completed! Thanks for the tip!

lh3 commented 3 years ago

Great to know!