ibest / ARC

Assembly by Reduced Complexity (ARC)
Apache License 2.0
41 stars 5 forks source link

Indexing is really slow. #48

Open samhunter opened 9 years ago

samhunter commented 9 years ago

Indexing is very slow. Currently only one file is indexed at any given time (limiting ARC to using only a single processor during indexing). Further tests need to be done to determine whether indexing multiple files at the same time will overwhelm disk I/O and/or result in overall improvements to indexing speed.

Ideas: 1) Create an adaptive strategy where parallel indexing processes are launched until the I/O overhead becomes significant (see python psutil). 2) Launch a fixed number of N indexing processes with N <= nprocs. Maybe make this configurable by the user. 3) Develop a new strategy for indexing the fastq files and/or recruiting reads (address #23, #43, and other issues in the way the reads are recruited).