Closed JunhuiLi1017 closed 7 months ago
Hi Yanmei,
I have a quick question about training Random Forest (RF) models using my own dataset with the refined model method. I saw you said "it's ok to manually-check ~100 hap=3 sites with igv", however, will variants with the refined genotypes such as 'mosaic,' 'het,' 'refhom,' and 'repeat' all be included? Additionally, do you have any recommendations for the number of variants needed for each refined genotype?
--Junhui
HI Junhui,
Sorry for the confusion. Based on my experience, only the "hap=3" category need to be further checked with igv, since these variants could be most probably further classified as "mosaic" and "repeat". As for "hap=2", these are most probably "het", and "hap>3" are most probably "repeat", igv-check for these sites are not necessary. You could use ~200-300 variants in total to train the refine model (these include hap2->het, hap3->repeat/mosaic, hap>3->repeat). Hope this solves your problem.
Best,
Yanmei
Hi Yanmei, I have a quick question about training Random Forest (RF) models using my own dataset with the refined model method. I saw you said "it's ok to manually-check ~100 hap=3 sites with igv", however, will variants with the refined genotypes such as 'mosaic,' 'het,' 'refhom,' and 'repeat' all be included? Additionally, do you have any recommendations for the number of variants needed for each refined genotype? --Junhui
HI Junhui,
Sorry for the confusion. Based on my experience, only the "hap=3" category need to be further checked with igv, since these variants could be most probably further classified as "mosaic" and "repeat". As for "hap=2", these are most probably "het", and "hap>3" are most probably "repeat", igv-check for these sites are not necessary. You could use ~200-300 variants in total to train the refine model (these include hap2->het, hap3->repeat/mosaic, hap>3->repeat). Hope this solves your problem.
Best,
Yanmei
Hi Yanmei,
Thanks for your clarification, this is very helpful.
Best, Junhui
Hi Yanmei,
I have a quick question about training Random Forest (RF) models using my own dataset with the refined model method. I saw you said "it's ok to manually-check ~100 hap=3 sites with igv", however, will variants with the refined genotypes such as 'mosaic,' 'het,' 'refhom,' and 'repeat' all be included? Additionally, do you have any recommendations for the number of variants needed for each refined genotype?
--Junhui