gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
365 stars 76 forks source link

prepDE.py error: could not locate transcript #337

Closed huangwb8 closed 2 years ago

huangwb8 commented 3 years ago

Hi~ I use stringtie v2.1.6 to calculate TPM of genes and transcripts, and it works well. However, when I came to prepDE.py or prepDE.py3 to prepare read count matrix for DESeq2 or edgeR, some error appeared.

My stringtie code is like:

stringtie ${id} -e -v \
    -A ${path_res}/${name}/gene_abund.tab \
    -C ${path_res}/${name}/cov_refs.gtf \
    -B -p 10 \
    -G ${path_gtf} \
    -o ${path_res}/${name}/merged.gtf --fr \
    > ${path_log}/${name}.log 2>&1

I set the preDE.config.txt:

D1      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D1/merged.gtf
D2      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D2/merged.gtf
D3      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D3/merged.gtf
D4      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D4/merged.gtf
D5      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D5/merged.gtf
D6      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D6/merged.gtf
D7      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D7/merged.gtf
D8      ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D8/merged.gtf
D9     ~/Project/RNA-Seq_A2B1-KO/output/stringtie/hisat2/D9/merged.gtf

When I run prepDE.py or prepDE.py3, like:

python ${prepDE} \
    -i ${path_res}/preDE.config.txt \
    -g ${path_res}/gene_count_matrix.csv \
    -t ${path_res}/transcript_count_matrix.csv \
    -l 144 \
    > ${path_log}/stringtie_prepDE.log 2>&1

The error appeared:

Error: could not locate transcript ENST00000647043.1 entry for sample D2
Traceback (most recent call last):
  File "~/stringtie/stringtie-2.1.6/prepDE.py", line 283, in <module>
    geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'D2'

I found similar problems exist in other issues, but the bug seemed to be fixed in the new version of stringtie. Now I don't know how to deal with it.

Could you give some suggestions? Thanks!

Okita0527 commented 2 years ago

I also encountered the same problem

huangwb8 commented 2 years ago

@htw1124 Maybe prepDE.py3 had been out of maintaining. Toolkits like featureCounts works well in the calculation of read count. You could try.

Tako-liu commented 2 years ago

Hi~ I also encountered the same problem and saw your problem, but I didn't find a good solution. However, after my attempt, I succeeded in running and got the results, but I don't know whether it is useful for you. You can refer to it. First of all, this is my error report:

Error: could not locate transcript mRNA:Solyc04g024965.1.1 entry for sample 2A
Traceback (most recent call last):
  File "./prepDE.py", line 283, in <module>
    geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: '2A'

I use 2.2.0 Stringtie version. I found prepDE in the stringtie folder Python3 script, so I use Python3 to run prepDE.py3 this script and ran successfully. I hope this method will help you.

gpertea commented 2 years ago

The issue was identified and fixed in v2.2.1 release https://github.com/gpertea/stringtie/releases/tag/v2.2.1

carlmed00 commented 1 year ago

Hello! I am using v2.2.1 but encounter a similar error.

CH_RNA-ZF_BPATCS/CH/EC_output$ prepDE.py -i sequence_mapping.txt -g genematrix.csv -t transcriptmatrix.csv
Error: could not locate transcript gene-b0731 entry for sample Chl1-2
Traceback (most recent call last):
  File "/home/ryan/stringtie/prepDE.py", line 284, in <module>
    geneDict.setdefault(geneIDs[i],{}) #gene_id
KeyError: 'gene-b0731'

I tried to recheck and made sure that I had -e on all my generated files but still same error.