zhqingit / giremi

GIREMI is a method that can identify RNA editing sites using one RNA-seq data set without requiring genome sequence data.
42 stars 15 forks source link

Interpretation of GIREMI results #28

Open ma-diroma opened 7 years ago

ma-diroma commented 7 years ago

Hi,

I used GIREMI to analyze an RNA-seq sample, which was previously processed by GATK to obtain the list of heterozygous SNVs required by your tool. GIREMI generated 2 different output files: file.res and file.res.res, which differ in some fields

file.res content chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar ifneg iflm lmp type A C G T

file.res.res content: chr coordinate strand ifSNP gene reference_base upstream_1base downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE

Could you please tell me if file.res is just a temporary file? Moreover, I see that for some editing sites only 1 base (generally the reference base) was detected, e.g.

chr coordinate strand ifSNP gene reference_base upstream_1base downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE chr1|899917|+ chr1 899917 + 0 KLHL17 A C G A 8 8 1 0 0.002023009 0.9 0 AC 8 0 0 0 1

Could you please tell me why this is identified as an edited site? Thank you, Maria Angela

zhqingit commented 7 years ago

Hi Angela,

The file.res.res is the final result file.

For these sites, what's the alternative frequencies from the GATK? Could you also try samtools to see if these sites are the heterozygous SNVs? Thank you!

Best, Qing

2016-11-22 9:27 GMT-07:00 Maria Angela Diroma notifications@github.com:

Hi,

I used GIREMI to analyze an RNA-seq sample, which was previously processed by GATK to obtain the list of heterozygous SNVs required by your tool. GIREMI generated 2 different output files: file.res and file.res.res, which differ in some fields

file.res content chr coor strand ifsnp gene refB upB downB majorB majorN totN majorR ifmi mi mip ar ifneg iflm lmp type A C G T

file.res.res content: chr coordinate strand ifSNP gene reference_base upstream_1base downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE

Could you please tell me if file.res is just a temporary file? Moreover, I see that for some editing sites only 1 base (generally the reference base) was detected, e.g.

chr coordinate strand ifSNP gene reference_base upstream_1base downstream_1base major_base major_count tot_count major_ratio MI pvalue_MI estimated_allelic_ratio ifNEG RNAE_t A C G T ifRNAE chr1|899917|+ chr1 899917 + 0 KLHL17 A C G A 8 8 1 0 0.002023009 0.9 0 AC 8 0 0 0 1

Could you please tell me why this is identified as an edited site? Thank you, Maria Angela

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zhqingit/giremi/issues/28, or mute the thread https://github.com/notifications/unsubscribe-auth/ALL2zQrjK18HoYBkr-Y5O6WymVXw-iOTks5rAxfjgaJpZM4K5pE5 .

ma-diroma commented 7 years ago

I had already selected heterozygous SNVs from GATK HaplotypeCaller VCF file, where there is a different count of the alternative base for that position. Allele depth AD=8 for A, which is the reference base (the same count reported in GIREMI output), AD=6 for the alternative base C (0 in GIREMI output)

chr1 899917 . A C 241.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.323;ClippingRankSum=-0.968;DP=14;ExcessHet=3.0103;FS=2.363;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.581;QD=17.27;ReadPosRankSum=-2.905;SOR=0.169 GT:AD:DP:GQ:PL 0/1:8,6:14:99:270,0,2178

Since I used default parameters (-m=5), I expected to see the variant, which is even detected as editing site, but it is pretty weird base count does not correspond to the variant caller results.

Best, Maria Angela

zhqingit commented 7 years ago

Hi Angela,

Is the bam file you inputed to GATK and GIREMI same? Could you try samtools to call the SNV at that site, because giremi adopted samtools's SNV calling method.

Best, Qing

2016-11-23 2:47 GMT-07:00 Maria Angela Diroma notifications@github.com:

I had already selected heterozygous SNVs from GATK HaplotypeCaller VCF file, where there is a different count of the alternative base for that position. Allele depth AD=8 for A, which is the reference base (the same count reported in GIREMI output), AD=6 for the alternative base C (0 in GIREMI output)

chr1 899917 . A C 241.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.323; ClippingRankSum=-0.968;DP=14;ExcessHet=3.0103;FS=2.363; MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.581;QD=17.27; ReadPosRankSum=-2.905;SOR=0.169 GT:AD:DP:GQ:PL 0/1:8,6:14:99:270,0,2178

Since I used default parameters (-m=5), I expected to see the variant, which is even detected as editing site, but it is pretty weird base count does not correspond to the variant caller results.

Best, Maria Angela

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zhqingit/giremi/issues/28#issuecomment-262470617, or mute the thread https://github.com/notifications/unsubscribe-auth/ALL2zXFstFGpFYXgqgzAhBb74Sb6xPeCks5rBAvNgaJpZM4K5pE5 .

ma-diroma commented 7 years ago

Hi Qing,

Yes, I used the same bam file. Thus the different base count may be due to different variant callers. Thanks. I had another issue with a different alignment file, this time I used GSNAP, followed by GATK variant calling. I removed all homozygous sites from the SNVs list but I obtain only the intermediate out.res file, so I am missing the ifRNAe information. The only error was found in out.res.Rout:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 Calls: cbind -> cbind -> data.frame Esecuzione interrotta

I checked if there were pvalue_MI>0 & pvalue_MI<=0.05 & ifSNP==0 in the intermediate file, it is ok. Could you please suggest what is this new issue? Please find attached my list of SNVs, out.res and out.res.Rout files.

Thanks for your help, Maria Angela

giremi_gsnap.err.txt giremi_gsnap.out.res.Rout.txt giremi_gsnap.out.res.txt giremi_gsnap_snvlist.txt