Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.
Ideas:
1) Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil).
2) Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user.
3) Develop a new strategy for indexing the fastq files and/or recruiting reads (address #23, #43, and other issues in the way the reads are recruited).
Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.
Ideas: 1) Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil). 2) Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user. 3) Develop a new strategy for indexing the fastq files and/or recruiting reads (address #23, #43, and other issues in the way the reads are recruited).