nikostourvas / acorn_poolseq_pipeline

0 stars 0 forks source link

VarScan call variants in different regions of the genome #16

Closed nikostourvas closed 12 months ago

nikostourvas commented 1 year ago

To improve variant calling efficiency the genome is split up into chromosomes and scaffolds. Then variant calling takes place for each region (chromosome or scaffold) separately. The scaffolds for the Q. robur genome are orders of magnitude smaller than the 12 chromosomes. Therefore, a parallel variant calling effort will use 12 CPU cores for most of the time.

One way to further improve efficiency (i.e. use more CPU cores concurrently) would be to split the genome into multiple equal in length regions. However, what does this entail for INDEL discovery? For example, what if a split occurs within an INDEL?

nikostourvas commented 12 months ago

Solution implemented with the Chunk approach