Open JunhuiLi1017 opened 1 year ago
Hi @JunhuiLi1017 ,
Thanks for your info. That’s odd, since ReadLevel_Features_extraction.py should not have any stochastic procedures, and ReadLevel_Features_extraction.py would not give genotype predictions.
Could you compare the two output files of ReadLevel_Features_extraction.py and tell me what features are different? For example, for one specific variant, are the feature(s) different in two output files? Further, what do the different lines that present in only one file look like?
Thanks,
Yanmei
Hi @douym,
Thanks for your response.
Here are the details of the difference between 2 outputs(output1 and output2) from the same input file and script.
For all samples, output1 and output2 are all different. For example, I have an SNV list with 660 variants.
in the output1: we got 275 variants with feature information.
in the output2: we got 266 variants with feature information.
The no. of the variant with a common position between output1 and output2 is 224, where 35 variants are different for GC content and context. Here is an example of a specific mutation with the same position but GC content and context are different.
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
id | GCcontent | context -- | -- | -- -chr1-96186790-A-G | 0.45454546 | GCA -chr1-96186790-A-G | 0.33333333 | CTT
Hi there,
It seems that there is a bug in the script
ReadLevel_Features_extraction.py
I used the same snv list and bam file to extract features of snv, it will output different numbers of features in different times, and also will output the different number of mosaic variants.
Thanks, Junhui