I implemented multi-threading in the nanopore version of periscope, this is achieved by splitting the original bam file into parts (how many parts is determined by the user specified amount of threads). These then run in parallel.
Periscope seems to run slow (a few days using 1 or 2 threads) when there are more than 100 million reads and requires relatively large memory (>64gb). Do you have any suggestion for speeding up the analysis?
This user has noticed it is pretty slow on very large datasets for Illumina so it would be good to migrate these changes over to the Illumia search functions!
[x] Merge existing vanguard with current master and resolve conflicts
[x] Fix vanguard for nanopore data
[x] Move vanguard optimisation to illumina data
[x] Fix a bug where more threads create a few more counts for certain ORFs in illumina data (Fixed by implementing multiprocessing on process_reads function, but this does not include processing pairs. Limitation to this solution is the lack of _periscope.bam file being created at the end of periscope.py. This is caused by multiprocessing being unable to deal with pysam object [potential security/incompatibility issues with C])
I implemented multi-threading in the nanopore version of periscope, this is achieved by splitting the original bam file into parts (how many parts is determined by the user specified amount of threads). These then run in parallel.
Periscope seems to run slow (a few days using 1 or 2 threads) when there are more than 100 million reads and requires relatively large memory (>64gb). Do you have any suggestion for speeding up the analysis?
This user has noticed it is pretty slow on very large datasets for Illumina so it would be good to migrate these changes over to the Illumia search functions!