I had a malformed genetic map file, and while diagnosing and fixing this, I noticed that shapeit4 currently reads in input files, then reads in the genetic map sequentially. It took substantial amounts of time before shapeit4 would reach reading in the genetic map, meaning that it took a long time for me to hit this error, notice, and diagnose. Peeking at the code, these two operations are fully independent and could be done simultaneously on multiple threads, making the pre-phasing initialization happen much faster.
The only major complicaiton that I see would be keeping logging output ordered. The file read operations themselves are into fully independent structures, so there's no need to worry about read or write contention that I see.
I had a malformed genetic map file, and while diagnosing and fixing this, I noticed that shapeit4 currently reads in input files, then reads in the genetic map sequentially. It took substantial amounts of time before shapeit4 would reach reading in the genetic map, meaning that it took a long time for me to hit this error, notice, and diagnose. Peeking at the code, these two operations are fully independent and could be done simultaneously on multiple threads, making the pre-phasing initialization happen much faster.
The only major complicaiton that I see would be keeping logging output ordered. The file read operations themselves are into fully independent structures, so there's no need to worry about read or write contention that I see.