gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
372 stars 77 forks source link

Malformed t_data.ctab file #345

Closed adc0032 closed 2 years ago

adc0032 commented 2 years ago

Hi!

I'm getting about 6-9 repeated lines at the end of my t_data.ctab file, cause a parsing error when importing stringtie data into tximport. The pattern consists of a truncated line and then repeated lines:

16602   scaffold_99 -   393854  393935  FUN_009122-T1   1   82  FUN_009122  .   0.000000    0.000000
16603   scaffold_99 -   394013  394086  FUN_009123-T1   1   74  FUN_009123  .   0.000000    0.000000
16604   scaffold_99 -   394351  394434  FUN_009124-T1   1   84  FUN_009124  .   0.000000    0.000000
16605   scaffold_99 +   394555  394636  FUN_009125-T1   1   82  FUN_009125  .   0.000000    0.000000
16606   scaffold_99 +   395049  395121  FUN_009126-T1   1   73  FUN_009126  .   0.000000    0.000000
16607   scaffold_99 +   395628  395700  FUN_009127-T1   1   73  FUN_009127  .   0.000000    0.000000
16608   scaffold_99 -   397163  397236  FUN_009128-T1   1   74  FUN_009128  .   0.000000    0.000000
16609   scaffold_99 -   397306  397387  FUN_009129-T1   1   82  FUN_009129  .   0.000000    0.000000
16610   scaffold_991    +   1292    2175    FUN_016435-T1   4   642 FUN_016435  .   4.060748    2.124901
16611   scaffold_996    -   569 1908    FUN_016436-T1   6   432 FUN_016436  .   0.000000    0.000000
16612   scaffold_997    -   2772    4162    FUN_016437-T1   4   975 FUN_016437  .   1.754872    0.918286
_009126-T1  1   73  FUN_009126  .   0.000000    0.000000
16607   scaffold_99 +   395628  395700  FUN_009127-T1   1   73  FUN_009127  .   0.000000    0.000000
16608   scaffold_99 -   397163  397236  FUN_009128-T1   1   74  FUN_009128  .   0.000000    0.000000
16609   scaffold_99 -   397306  397387  FUN_009129-T1   1   82  FUN_009129  .   0.000000    0.000000
16610   scaffold_991    +   1292    2175    FUN_016435-T1   4   642 FUN_016435  .   4.421340    2.282904
16611   scaffold_996    -   569 1908    FUN_016436-T1   6   432 FUN_016436  .   1.458333    0.752992
16612   scaffold_997    -   2772    4162    FUN_016437-T1   4   975 FUN_016437  .   1.937949    1.000636

generated with the following command:

# Load Modules
module load stringtie/2.1.6

# Stringtie Assembly: -e limits from making new transcripts, -B formats for ballgown, --rf reverse first stranded library
stringtie -e --rf -B -G ${WD}/${RFP}.gtf -o ${WD}/${ID}/bg_${ID}/stringtie_${ID}.gtf ${WD}/${ID}/${SM}

Of course filtering incomplete and duplicates is my next step (and processing with prep_DE.py leads to the correct count file results), but I was wondering why this was happening.