mitoNGS / MToolBox

A bioinformatics pipeline to analyze mtDNA from NGS data
http://sourceforge.net/projects/mtoolbox/?source=navbar
GNU General Public License v3.0
90 stars 38 forks source link

GSNAP version update #43

Open TakaYamaguchi opened 7 years ago

TakaYamaguchi commented 7 years ago

Hello @clody23,

I was wondering if you have tested newer versions of GSNAP. It seems that GSNAP 2015-12-31 (as recommended by you) has some glitch in multi-threading mode and causes hangs for some samples when processing in mapExome.py in our cluster environment. If newer versions of GSNAP work fine and doesn't affect the downstream analysis, I will definitely update it.

I look forward to your response.

Thank you so much,

Taka

clody23 commented 7 years ago

Dear Taka,

With how many threads did you encounter this issue?

At the time we updated MToolBox to v.1.0 (around April 2016, although we announced it later on, in June 2016) GSNAP version 2015-12-31 was the most up-to-date one, if I recall correctly, that we extensively used on different machines, without finding any issue. However, I personally never tried to run the tool with more than 10 threads and I did not try it with subsequent updates of GSNAP, therefore I cannot say, but you can definitely try with newer versions and let us know.

I will also try to find time to test the tool with newer versions of GSNAP and see how it goes

Many thanks Claudia

TakaYamaguchi commented 7 years ago

Dear Claudia,

Thank you for the quick response. I've tried 2, 4, 8 cores with ~800 samples. Many of them usually run fine but ~ a few - 10% of them have some random hangs (probably compatibility issue between certain samples and nodes). Don't you think newer versions of GSNAP may affect the downstream analysis as it seems they've updated some flags? I'm worried if the results would be still valid after updating GSNAP.

Thank you,

Taka

clody23 commented 7 years ago

thanks for giving us more detailed information about your issue. Well, unless the options of GSNAP we use in the mapExome.py (which are very basic) did also change, I don't see why newer versions should not be valid. As I said, I didn't test them yet, so I cannot be 100% sure, but you can try on few samples and see how it goes.

Best, C

TakaYamaguchi commented 7 years ago

Hi Claudia,

Just following up on this topic. I've tried the latest version of GSNAP (ver 2017-10-12) and it seems the latest GSNAP works fine (no hangs!) for now although I saw several different errors across our samples in /MToolBox/summary.py IndexError: list index out of range - similar to this issue https://github.com/mitoNGS/MToolBox/issues/38

and also some samples got Traceback (most recent call last): File "MToolBox/summary.py", line 79, in output_file.write(str(k)+"\t"+str(dic_cov[k])+"\t"+str(dpt)+"\t"+str(dic_haplo[k])+"\t"+str(dic_homo[k])+"\t"+str(dic_low_hetero[k])+"\t"+str(dic_high_hetero[k])+"\t"+str(dic _var[k])+"\t"+str(dic_prio[k])+"\n") KeyError: 'samplename_mitochondria'

Anyway, I would suggest that you use the latest version of GSNAP and it would be nice if you could look into the errors that I got ( I can create a separate ticket).

Thank you,

Taka