hputnam / Becker_E5

3 stars 0 forks source link

Run one RNASeq sample in stringtie against a "fixed" gff3 and compare it to Polina's "modified" gff3 to see of counts are being assigned to the same gene #7

Closed hputnam closed 3 months ago

hputnam commented 4 months ago

https://github.com/hputnam/Becker_E5/blob/master/RAnalysis/Scripts/RNA-seq/fix_gff_format_AH.Rmd

The output of this script will created a gff3 file with "fixed" in the name

daniellembecker commented 4 months ago

@hputnam different counts are being assigned to each gene and transcript ID, see step number 8 in my workflow where I ran stringtie on sample E9 against the "fixed" gff3 and the "modified" gff3.

I then looked at the comparison between the E9_R1_fixed_transcript_count_matrix.csv (pwd /data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/counts/gff3.count.compare)

Screen Shot 2024-06-12 at 3 38 02 PM

and the E9_R1_modified_transcript_count_matrix.csv (pwd /data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/counts/gff3.count.compare)

Screen Shot 2024-06-12 at 3 28 03 PM

and they are very different in terms of number of counts per transcript, the modified have many more gene counts compared to the fixed gff3 so looking more into this

daniellembecker commented 4 months ago

I also ran this with the updated workflow on the original published .gff3 and thankfully that does match to my modified.gff3 counts, so maybe something in the fixed.gff3 code is throwing it off?

Screen Shot 2024-06-12 at 4 13 15 PM
hputnam commented 4 months ago

@daniellembecker they are different approaches to solve the "problem" of non-matching names in the gff3 that gives stringtie problems

hputnam commented 4 months ago

We need to confirm which one has the most evidence of being correct in the way it is modifying the gff.

hputnam commented 4 months ago

Can you clarify what was changed in the "updated workflow on the original published .gff3 " statement above?

hputnam commented 4 months ago

my suggestion is taking IGV and looking at the bam files of several genes and seeing where they counts stack up and comparing that to the counts in the matrix

daniellembecker commented 4 months ago

Can you clarify what was changed in the "updated workflow on the original published .gff3 " statement above?

I updated the workflow in section 8 to include the original .gff3 file along with the modified and fixed

daniellembecker commented 3 months ago

visual comparison for original and modified gff3 comparison to counts in the matrix seem logical and I do not see any inconsistencies between the original and modified gff3s for positions in various chromosome sequences

should we discuss further or can I move forward in updated analyses, etc.?

Screen Shot 2024-06-18 at 9 06 33 AM

hputnam commented 3 months ago

Yes, let's move forward with the modified version and the pearl script.

I also ran this script to check the modified and unmodified contents: https://github.com/hputnam/Poc_RAPID/blob/main/RAnalysis/scripts/compare_multiple_gffs.Rmd