Closed prmcadam closed 9 years ago
Given that R can load the table so much more efficiently, it may be time to re-engineer the parsSNPtable script to use Rpy2 - or just bite the bullet and write a C++ version.
Meantime, for large data sets, this can be added - not sure when atm
I don't think it's the loading of the file that's the issue. A 2.1Gb csv file is read in ~3 seconds, I think it's the checking for invariant sites that's the limiting step in parseSNPtable?
parseSNPTable now fixed in RedDog v1b.2 - now processes 2.1Gb file in 20 min, not 22 hr.
post release, will add option to turn off FastTree
Option to turn off phylogenetic tree generation now added to v1beta.3 (will be released when testing is finished, which is going VERY slow thanks to the barcoo queue.)
As discussed earlier. Would it be possible to have an option to turn off specific steps (parseSNPtable and fasttree) during merge runs? Desired output would be a collated allele table with no filtering, and run folder merged with previous run results.