parklab / MosaicForecast

A mosaic detecting software based on phasing and random forest
MIT License
60 stars 21 forks source link

Unable to retrieve the predictions (i.e low confidence Mosaic) #39

Open ngs1810 opened 9 months ago

ngs1810 commented 9 months ago

Hello.

When i used MF in 2021, upon the genotype prediction steps, i was able to see the prediction to have such categories:-

cat $file.genotype.predictions.refined.300721.bed | cut -f35 | sort | uniq -c 23578 het > 682 mosaic > 1 mosaic;cautious:AF<0.01;low-confidence:extra-high-coverage 19 mosaic;cautious:only-1-altallele 39 mosaic;cautious:only-1-altallele;low-confidence:extra-high-coverage 1136 mosaic;low-confidence:extra-high-coverage 50 mosaic;low-confidence:extra-high-coverage;low-confidence:likelyCNV 1 prediction 6 refhom 1562 repeat

But, i redownloaded the software in March 2022, and the same sample do not give the same predictions for mosaic. It categorised every mosaic as "mosaic" instead providing the information whether it is low confidence as previously. and i have been using the same commands since 2021. The number of "mosaic" is still the same for both "outputs". i just need to filter out the low confident mosaic calls in my analysis.

cat 003P.genotype.predictions.refined.bed | cut -f35 | sort | uniq -c 23579 het > 1927 mosaic 1 prediction 6 refhom 1562 repeat

Command Used: singularity run -B /hpcfs /hpcfs/users/$USER/mosaicforecast_0.0.1.sif Prediction.R $DIR/${sample[$SLURM_ARRAY_TASK_ID]}.features.bed $MFORECAST/models_trained/50xRFmodel_addRMSK_Refine.rds Refined $DIR/${sample[$SLURM_ARRAY_TASK_ID]}.genotype.predictions.refined.bed

I am not sure how to retrieve back the original classification, although i can do that manually in R. But, do let me know if there is additional settings that i am not aware of.

Thank you.

douym commented 9 months ago

Hello.

When i used MF in 2021, upon the genotype prediction steps, i was able to see the prediction to have such categories:-

cat $file.genotype.predictions.refined.300721.bed | cut -f35 | sort | uniq -c 23578 het > 682 mosaic > 1 mosaic;cautious:AF<0.01;low-confidence:extra-high-coverage 19 mosaic;cautious:only-1-altallele 39 mosaic;cautious:only-1-altallele;low-confidence:extra-high-coverage 1136 mosaic;low-confidence:extra-high-coverage 50 mosaic;low-confidence:extra-high-coverage;low-confidence:likelyCNV 1 prediction 6 refhom 1562 repeat

But, i redownloaded the software in March 2022, and the same sample do not give the same predictions for mosaic. It categorised every mosaic as "mosaic" instead providing the information whether it is low confidence as previously. and i have been using the same commands since 2021. The number of "mosaic" is still the same for both "outputs". i just need to filter out the low confident mosaic calls in my analysis.

cat 003P.genotype.predictions.refined.bed | cut -f35 | sort | uniq -c 23579 het > 1927 mosaic 1 prediction 6 refhom 1562 repeat

Command Used: singularity run -B /hpcfs /hpcfs/users/$USER/mosaicforecast_0.0.1.sif Prediction.R DIR/{sample[$SLURM_ARRAY_TASK_ID]}.features.bed $MFORECAST/models_trained/50xRFmodel_addRMSK_Refine.rds Refined DIR/{sample[$SLURM_ARRAY_TASK_ID]}.genotype.predictions.refined.bed

I am not sure how to retrieve back the original classification, although i can do that manually in R. But, do let me know if there is additional settings that i am not aware of.

Thank you.

Hi @ngs1810 ,

Thanks for your message. I checked "https://github.com/parklab/MosaicForecast/blob/master/Prediction.R" and confirmed that the "low-confidence" predictions are still there. Is it possible that your input lines happen to not contain the low-confidence mutations?

best wishes,

Y.