katholt / RedDog

32 stars 4 forks source link

Feature Request: turn off modules during merge runs #36

Closed prmcadam closed 9 years ago

prmcadam commented 9 years ago

As discussed earlier. Would it be possible to have an option to turn off specific steps (parseSNPtable and fasttree) during merge runs? Desired output would be a collated allele table with no filtering, and run folder merged with previous run results.

d-j-e commented 9 years ago

Given that R can load the table so much more efficiently, it may be time to re-engineer the parsSNPtable script to use Rpy2 - or just bite the bullet and write a C++ version.

Meantime, for large data sets, this can be added - not sure when atm

prmcadam commented 9 years ago

I don't think it's the loading of the file that's the issue. A 2.1Gb csv file is read in ~3 seconds, I think it's the checking for invariant sites that's the limiting step in parseSNPtable?

d-j-e commented 9 years ago

parseSNPTable now fixed in RedDog v1b.2 - now processes 2.1Gb file in 20 min, not 22 hr.

post release, will add option to turn off FastTree

d-j-e commented 9 years ago

Option to turn off phylogenetic tree generation now added to v1beta.3 (will be released when testing is finished, which is going VERY slow thanks to the barcoo queue.)