Closed zhangguangsi closed 2 years ago
Hi,
I am really glad you decided to look into RNAseqCNV and thank you for your insightful question.
21p does not carry too many genes. Since our gene filtering steps have additionally excluded a number of them, the copy number prediction was not as reliable as we would have liked it to be, so we excluded it from the analysis. That said, it would be possible to include it in the analysis (perhaps through an additional parameter?).
Regards, Jan
Hi, Thanks for prompt answer. If we edit the script and do not filter the 21p and the chr 13 14 15, the the model_dip in randomForest:::predict.randomForest(model_dip , ...) need to be modeled again ?
In addition, is the data used for randomForest modeling the 40 samples mentioned on the home page (The in-build standard contains gene expression data from 40 ALL samples without large-scale CNVs ) ? Thanks .
The edit should work - it is worth a try. If it crashes, let me know. That said, there might too few genes left for a robust prediction of CNV by the model and the figure might be more challenging to interpret. In addition, the 21p arm was not included in the model training data. All in all, I would expect worse accuracy.
No, the model was trained on a larger dataset, which included hundreds of instances of large-scale CNVs. If you would be interested in the method more in-depth, we have recently published a paper on RNAseqCNV - https://www.nature.com/articles/s41375-022-01547-8
If you have any other questions, do not hesitate to write again.
Hi, I have read the code file shiny_utils.R and RNAseqCNV_wrappper.R. But I do not understand why filter 21p in get_arm_metr function , and filter(arm != "p" | !chr %in% c(13, 14, 15, 21)) befoe metr_dipl(). Thanks.