Closed DiDeoxy closed 3 years ago
Hi Max,
Version 0.7.0 is now released and the bundled forests are available. I would just use the germline forest germline.v0.7.0.forest
, especially in the absence of a good truth set.
You can install the new version with forests as shown here.
Best Dan
Awesome, congrats on getting to the 0.7.0 release!
I got a clean install to compile, will test it out on some BAMs now!
Cheers,
Max.
Hi, Dan,
I know there is no random forest for 0.7.0 so I am attempting to roll my own.
I have downloaded GIAB vcf file (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/) and alignment for NA12878 (https://github.com/genome-in-a-bottle/giab_data_indexes) as well as hg19 (which the above are aligned/called against) (https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/).
I have made the following config for train_random_forest,py:
Is this a reasonable setup for training the model or should I get more samples? I copied your example settings for training exactly, I have no idea if these are appropriate. Finally, I will be running this training on a compute cluster and I need to estimate resource utilization, I am currently targeting 6 cores and 24 GB of RAM and 24 hrs of wall time. Is this a reasonable estimate?
Oh, and I will be calling barley alignments with this forest, I can't find a good known variants data set for barley, this should degrade results to much should it?
Cheers,
Max H.