maickrau / GraphAligner

MIT License
256 stars 30 forks source link

Software crashing while indexing MHC graph #6

Closed cjain7 closed 4 years ago

cjain7 commented 4 years ago

Hi Mikko

I am trying to use GraphAligner for a MHC graph (this is the MHC2 graph in our paper- "Accelerating sequence to graph alignment"). However, with both v1.0.9 and bit-parallel code (commit: 23a0ddf), the software is crashing with "bad::alloc" error during the graph loading phase. Could you check and advise what can be the reason?

The graph (in .gfa format) can be downloaded from here: https://drive.google.com/file/d/1ssySLPGtH5ppmPZQcFLQwdOkqOAaQbsY/view?usp=sharing

Note that the graph has "N" characters. In case that's a problem, simply replace them with "A" character.

Please pull any MHC read set from here. At my end, the code is crashing at the indexing part regardless of the read set. I am running the code(s) with default parameters.

Thanks!

maickrau commented 4 years ago

Hi,

This seems to be because the graph has each base pair in its own separate node, even in linear non-branching parts. I tried loading just the graph without indexing and even that took about 11Gb of RAM.

You can merge the linear parts into unitigs with vg: "vg view -Fv MHC2.gfa | vg mod -u - | vg view - > MHC2_unitig.gfa". This takes around 25Gb RAM but the resulting graph is just 17Mb. I uploaded this to: https://drive.google.com/open?id=1KGnrhAac0tonhzJRtf40rSUz64i21yna

With this graph it takes around 1min and 250Mb RAM to align M2.fastq with a single core and default parameters.

cjain7 commented 4 years ago

Thanks! This is useful to know.