gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
361 stars 76 forks source link

<class 'str'> with high reads after running prepDE.py3 #403

Open carlmed00 opened 1 year ago

carlmed00 commented 1 year ago

Upon checking the gene.csv file generated after running prepDE.py3, the bottom row contained a row "<class 'str'> Screenshot from 2023-07-04 17-22-57 does it have any implications? also, the trans.csv seems incomplete since some transcripts did not contain any count (not even 0) but rather blank.

Is there any Problem with that or will that affect my downstream analysis? I have attached the two files for reference trans.csv gene.csv

mickeykawai commented 1 year ago

I met the same issues. It caused problem at downstream analysis, when I used edgeR. I don't know if it makes problems on Stringtie's own pipeline such as ballgown.

FYI, I fixed by myself as follows. (also c.f. my previous issue at https://github.com/gpertea/stringtie/issues/400)

$ diff /bio/package/stringtie/2.2.1/prepDE.py3 /bio/package/stringtie/2.2.1/prepDE.py3.orig 
172,173c172,173
< #    else: #we found the "bad" genes!
< #        break
---
>     else: #we found the "bad" genes!
>         break
290,292d289
<         for x,y in samples:
<             if t_dict[i][x] == "":
<                 t_dict[i][x] = 0
303,305d299
<         for x,y in samples:
<             if geneDict[i][x] == "":
<                 geneDict[i][x] = 0

prepDE.py3.txt prepDE.py3.orig.txt

xiangy-hu commented 1 month ago

I met the same issues. It caused problem at downstream analysis, when I used edgeR. I don't know if it makes problems on Stringtie's own pipeline such as ballgown.

FYI, I fixed by myself as follows. (also c.f. my previous issue at #400)

$ diff /bio/package/stringtie/2.2.1/prepDE.py3 /bio/package/stringtie/2.2.1/prepDE.py3.orig 
172,173c172,173
< #    else: #we found the "bad" genes!
< #        break
---
>     else: #we found the "bad" genes!
>         break
290,292d289
<         for x,y in samples:
<             if t_dict[i][x] == "":
<                 t_dict[i][x] = 0
303,305d299
<         for x,y in samples:
<             if geneDict[i][x] == "":
<                 geneDict[i][x] = 0

prepDE.py3.txt prepDE.py3.orig.txt

Yes, you are right, awesome!!!!!!!!!!!!!!