To improve variant calling efficiency the genome is split up into chromosomes and scaffolds. Then variant calling takes place for each region (chromosome or scaffold) separately. The scaffolds for the Q. robur genome are orders of magnitude smaller than the 12 chromosomes. Therefore, a parallel variant calling effort will use 12 CPU cores for most of the time.
One way to further improve efficiency (i.e. use more CPU cores concurrently) would be to split the genome into multiple equal in length regions. However, what does this entail for INDEL discovery? For example, what if a split occurs within an INDEL?
To improve variant calling efficiency the genome is split up into chromosomes and scaffolds. Then variant calling takes place for each region (chromosome or scaffold) separately. The scaffolds for the Q. robur genome are orders of magnitude smaller than the 12 chromosomes. Therefore, a parallel variant calling effort will use 12 CPU cores for most of the time.
One way to further improve efficiency (i.e. use more CPU cores concurrently) would be to split the genome into multiple equal in length regions. However, what does this entail for INDEL discovery? For example, what if a split occurs within an INDEL?