kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
245 stars 42 forks source link

Model training data #158

Closed mproberts99 closed 2 years ago

mproberts99 commented 2 years ago

Hi, Thank you for all the work on this and guide provided. I am looking at performance of PMDV (model --ont_r9_guppy5_sup) on GIAB samples and was confused about which samples were used in training, and therefore which I should leave out in the performance assessment. Is it only chr1-19 from HG002 used or are other GIAB samples used in training as well?

Thanks in advance!

kishwarshafin commented 2 years ago

hi @mproberts99 ,

Sorry for the delay on this issue.

We trained our models on: HG002, HG004, HG005, HG006, HG007 samples on chr1-chr19. HG003 was completely held out and chr20, chr21 and chr22 was complete held out from all of the samples.

If you want to do a performance assessment then HG003 whole genome and HG003 chr20 will give you the best evolution. That's also what we report in our documentations.

mproberts99 commented 2 years ago

Thanks so much @kishwarshafin !