Closed hputnam closed 3 months ago
@hputnam different counts are being assigned to each gene and transcript ID, see step number 8 in my workflow where I ran stringtie on sample E9 against the "fixed" gff3 and the "modified" gff3.
I then looked at the comparison between the E9_R1_fixed_transcript_count_matrix.csv (pwd /data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/counts/gff3.count.compare)
and the E9_R1_modified_transcript_count_matrix.csv (pwd /data/putnamlab/dbecks/Becker_E5/Becker_RNASeq/data/counts/gff3.count.compare)
and they are very different in terms of number of counts per transcript, the modified have many more gene counts compared to the fixed gff3 so looking more into this
I also ran this with the updated workflow on the original published .gff3 and thankfully that does match to my modified.gff3 counts, so maybe something in the fixed.gff3 code is throwing it off?
@daniellembecker they are different approaches to solve the "problem" of non-matching names in the gff3 that gives stringtie problems
We need to confirm which one has the most evidence of being correct in the way it is modifying the gff.
Can you clarify what was changed in the "updated workflow on the original published .gff3 " statement above?
my suggestion is taking IGV and looking at the bam files of several genes and seeing where they counts stack up and comparing that to the counts in the matrix
Can you clarify what was changed in the "updated workflow on the original published .gff3 " statement above?
I updated the workflow in section 8 to include the original .gff3 file along with the modified and fixed
visual comparison for original and modified gff3 comparison to counts in the matrix seem logical and I do not see any inconsistencies between the original and modified gff3s for positions in various chromosome sequences
should we discuss further or can I move forward in updated analyses, etc.?
Yes, let's move forward with the modified version and the pearl script.
I also ran this script to check the modified and unmodified contents: https://github.com/hputnam/Poc_RAPID/blob/main/RAnalysis/scripts/compare_multiple_gffs.Rmd
https://github.com/hputnam/Becker_E5/blob/master/RAnalysis/Scripts/RNA-seq/fix_gff_format_AH.Rmd
The output of this script will created a gff3 file with "fixed" in the name