Closed AishMandya closed 5 years ago
Hi,I have the same question ,do you have solved?
No, not yet. IT seems to stop working at the second iteration, no matter which file it is. So it may be a glitch in the code or the way I have used stringtie to generate the gtf files. Also, I don't fully understand how the code works so it's definitely inconclusive
Hi,AishMandya,it actually made me crazy !but when I use older version ,it works! maybe you can try,hope it will help you.
Hi everyone Sorry, I won't help. I have the exact same problem. I even run again everything since at first I thought that I didn't have the right genome gtf file. Unfortunately, it still does not work. If you get the answer, I am really interested!
Emmanuelle
Thanks ill try that!
On Thu, 29 Aug, 2019, 5:27 AM lelesama, notifications@github.com wrote:
Hi,AishMandya,it actually made me crazy !but when I use older version ,it works! maybe you can try,hope it will help you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gpertea/stringtie/issues/234?email_source=notifications&email_token=ANACMJHZ5PUNYKILGWECYETQG4GGTA5CNFSM4IPOZ6U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MZ4WY#issuecomment-525966939, or mute the thread https://github.com/notifications/unsubscribe-auth/ANACMJCPKWQB2FMW4UHEFA3QG4GGTANCNFSM4IPOZ6UQ .
Hey, So I haven't exactly found the answer for prepde.py. although, what I did was use tximport to assess the ctab files generated for each sample in stringtie and use that output for DeSeq and edgeR
On Thu, 12 Sep, 2019, 4:58 PM e-lerat, notifications@github.com wrote:
Hi everyone Sorry, I won't help. I have the exact same problem. I even run again everything since at first I thought that I didn't have the right genome gtf file. Unfortunately, it still does not work. If you get the answer, I am really interested!
Emmanuelle
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gpertea/stringtie/issues/234?email_source=notifications&email_token=ANACMJCN76OKSOFHBKNU6ODQJIRVJA5CNFSM4IPOZ6U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6RRV6Y#issuecomment-530782971, or mute the thread https://github.com/notifications/unsubscribe-auth/ANACMJAWYTDTEIQTUQTJYITQJIRVJANCNFSM4IPOZ6UQ .
Did you use the --merge during stringtie step?
Hi, Yes, i did use merge, all --merge does is merge all the gtf files to give the commonly hit genes, but no tpm info. The statistics is also relating to the common genes found and not their tpm or log fold change. I ended up writing a code for getting the tpm values and their averages across samples and calculates the log fold change using the the common genes output from --merge. I don't think the method is too reliable because it's not recommended to do any kind of DE On tpm values. But string tie outputs unannotated genes (or sequences) as well, which may be very useful.
Finally, i ended up doing most of my analysis with salmon, tximport and edgeR/deSeq2
Thanks, Aish
On Fri, 4 Oct, 2019, 1:22 PM SofiaZhangtj, notifications@github.com wrote:
Did you use the --merge during stringtie step?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gpertea/stringtie/issues/234?email_source=notifications&email_token=ANACMJCDRCNSIPR4BKYBDN3QM3Y35A5CNFSM4IPOZ6U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKZ4QY#issuecomment-538287683, or mute the thread https://github.com/notifications/unsubscribe-auth/ANACMJBMJVLXCOK2DVZA6GDQM3Y35ANCNFSM4IPOZ6UQ .
Hi Aish, Thank you very much for your kind answer. I think I finally found my problem. This prepDE.py script is supposed to based on version 1.2, and now the software has been updated many times but the scripts have not. I changed my stringtie version from 2 to 1.3.3, then the script works.
Hi @SofiaZhangtj and @AishMandya ,
I am using stringtie 1.3.3 with the prepDE.py, to generate files for DESeq2 and I keep clogging up at the error:
Traceback (most recent call last):
File "/cluster/home/cscipion/scripts/prepDE", line 257, in
I use have successfully used the sample 'B1pr_S13_L002' in several other comparisons, but this set of samples is rejecting it for some reason.
Any thoughts? Maybe @gpertea can help.
Hi @SofiaZhangtj and @AishMandya , I am using stringtie 1.3.3 with the prepDE.py, to generate files for DESeq2 and I keep clogging up at the error: Traceback (most recent call last): File "/cluster/home/cscipion/scripts/prepDE", line 257, in geneDict[geneIDs[i]][s[0]]+=v[s[0]] KeyError: 'B1pr_S13_L002'
I use have successfully used the sample 'B1pr_S13_L002' in several other comparisons, but this set of samples is rejecting it for some reason.
Any thoughts? Maybe @gpertea can help.
Hi, I found that even the last version 1.3.6 works for me. I think met a same problem with yours at the beginning. That time I didn't use the "-e" parameter in string-tie.
I'll investigate the possibility that some changes in Stringtie v2 may have affected the compatibility with prepDE.py, but in the past there were a lot of "errors" alleged by users of prepDE.py which were mainly caused by an incorrect usage of the script. To reiterate and clarify: prepDE.py can only be used on a set of stringtie GTF outputs if stringtie was run, for all those outputs:
Also, make sure that no other GTF files (like the reference annotation file) are present in those sub-directories, only the stringtie output GTF files should be found there, as the default mode of operation for prepDE is to scan all the sub-directories there for .gtf files which are all expected to have been produced by stringtie by following the requirements above (-e option, same -G file).
@gpertea Hi Pertea, Thank you for your reply. The output gtf files of stringtie v2 have different lines, but in previous vision it was the same. But the t_data.ctab files remain same as the older version. I think that's why prepDE.py doesn't work for the 2.0 version for my case.
lines number of the older version (GTF file) (Two pairs of identical sequencing data ) lines number of the new version :
Any suggestions will be helpful. Thank you.
I am using the -e and -B option, and there is only one .gtf in the directory. The oddity is really that I have 23 samples (7 triplicates + 2 others). Samples 1-3 have been compared against 4-6, 7-9, 16-18 with no issues. When I compare 1-3 vs 10-12 is the only time I get the previously mentioned error. In this case I am sure everything is set up correctly, I am tempted to think that it’s a stringtie issue and not a syntax problem. Any further suggestions? Thank you all!
error is
Traceback (most recent call last):
File "/cluster/home/cscipion/scripts/prepDE", line 257, in
I've added some consistency checking to the prepDE.py script when reading the input data, it should catch some common usage errors. Could you please download the latest prepDE.py script, place it in your working directory, make sure it's executable and then run it again with the same parameters you used before but this time add the -v option, capturing the output in a file, with a command like this:
./prepDE.py
(your parameters here) -v 2>&1 | tee prepDE.log
(Use the link above to get this updated script, or you can also download the attached prepDE.py.gz, copy it into your working directory, gunzip it and make it executable, then make sure you run it with ./prepDE.py
)
You can then show the prepDE.log here or email it to me.
Thank you ! @gpertea/stringtie reply@reply.github.com
On Mon, Oct 14, 2019 at 9:18 AM Geo Pertea notifications@github.com wrote:
I've added some consistency checking to the prepDE.py script when reading the input data, it should catch some common usage errors. Could you please download the latest prepDE.py script https://raw.githubusercontent.com/gpertea/stringtie/master/prepDE.py, place it in your working directory, make sure it's executable and then run it again with the same parameters you used before but this time add the -v option, capturing the output in a file, with a command like this:
./prepDE.py (your parameters here) -v 2>&1 | tee prepDE.log
(Use the link above to get this updated script, or you can also download the attached prepDE.py.gz https://github.com/gpertea/stringtie/files/3722825/prepDE.py.gz, copy it into your working directory, gunzip it and make it executable, then make sure you run it with ./prepDE.py)
You can then show the prepDE.log here or email it to me.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gpertea/stringtie/issues/234?email_source=notifications&email_token=ANACMJD4OTLEM7UBLNA32ALQOPTXDA5CNFSM4IPOZ6U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBDI2ZQ#issuecomment-541494630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANACMJDUHRSYOUVBLJR2VATQOPTXDANCNFSM4IPOZ6UQ .
Hi,
I am having the same issue (error at second file). The log file using the "new" script is: processing sample S001_T from file ./S001_T/S001_T_ST.gtf
processing sample S002_T2 from file ./S002_T2/S002_T2_ST.gtf Error: could not locate transcript S001_T.20797.1 entry for sample S002_T2 Traceback (most recent call last): File "prepDE.py", line 283, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]] KeyError: 'S002_T2'
I don't really understand because the previous line in the script geneDict[geneIDs[i]].setdefault(s[0],0) should have created a key for s[0]...
Thanks!
I'll investigate the possibility that some changes in Stringtie v2 may have affected the compatibility with prepDE.py, but in the past there were a lot of "errors" alleged by users of prepDE.py which were mainly caused by an incorrect usage of the script. To reiterate and clarify: prepDE.py can only be used on a set of stringtie GTF outputs if stringtie was run, for all those outputs:
- with the -e option
- with the same file for the -G option.
Also, make sure that no other GTF files (like the reference annotation file) are present in those sub-directories, only the stringtie output GTF files should be found there, as the default mode of operation for prepDE is to scan all the sub-directories there for .gtf files which are all expected to have been produced by stringtie by following the requirements above (-e option, same -G file).
Is it mandatory to use -e option?? As a matter of fact, I am working on detecting the novel splice sites so I should disregards -e option @gpertea
This (the OP) seems to be the same issue with #232, so it should be fixed in v2.0.4 release.
Also same with #238, I'll leave only that issue open for a while, for user confirmation that the problem was fixed in v2.0.4
@gpertea I am all good now with the new versions, on my side you can close the issue. Thanks a lot!
I've added some consistency checking to the prepDE.py script when reading the input data, it should catch some common usage errors. Could you please download the latest prepDE.py script, place it in your working directory, make sure it's executable and then run it again with the same parameters you used before but this time add the -v option, capturing the output in a file, with a command like this:
./prepDE.py
(your parameters here)-v 2>&1 | tee prepDE.log
(Use the link above to get this updated script, or you can also download the attached prepDE.py.gz, copy it into your working directory, gunzip it and make it executable, then make sure you run it with
./prepDE.py
)You can then show the prepDE.log here or email it to me. Hi @gpertea,
I am running version v.2.2.1 and I'm getting the same error.
Hi everyone,
Same error for StringTie v2.2.1,
By using the prepDE.py version than @gpertea made the diagnosing is:
prepDE.py -i samples.txt -v 2>&1 | tee prepDE.log
processing sample SRR8956796 from file /home/rvazquez/RNA_SEQ_ANALYSIS/ASSEMBLY/STRINGTIE/QUANTIFICATION/DENOVO_MODE/SRR8956796_eB_dir/SRR8956796_eB.gtf processing sample SRR8956797 from file /home/rvazquez/RNA_SEQ_ANALYSIS/ASSEMBLY/STRINGTIE/QUANTIFICATION/DENOVO_MODE/SRR8956797_eB_dir/SRR8956797_eB.gtf Error: could not locate transcript MSTRG.31643.1 entry for sample SRR8956797 Traceback (most recent call last): File "/home/rvazquez/RNA_SEQ_ANALYSIS/stringtie/prepDE.py", line 284, in
geneDict.setdefault(geneIDs[i],{}) #gene_id KeyError: 'MSTRG.31643.1'
Although this issue is closed, no one commented the StringTie v2.2.1 problem is solved using the prepDE.py3
prepDE.py3 -i samples.txt -v 2>&1
...
..writing transcript_count_matrix.csv
..writing gene_count_matrix.csv
All done.
I also encounter the same problems with all tested versions of Stringtie. When I use the prepDE.py3 script, it gives me a very weird gene count matrix, where samples 2-x show massive zero inflation while sample 1 looks normal. Also the last line does not look like expected:
If anybody has any hints on how to solve this please let me know.
Edit: The error disappeared when I ran stringtie without the -x option. Not sure why this option caused the error, but now everything works
@tsznxx @nongbaoting @gpertea Hi I have modified the sample folder to label space file path label space file path but the error persists Each gtf file is inside a subdirectory of the ballgown directory generated by the stringtie -B -e
$ prepDE.py -i sample_1st.txt output: 0 A1_S1 1 A2_S2 Traceback (most recent call last): File "prepDE.py", line 257, in
geneDict[geneIDs[i]][s[0]]+=v[s[0]]
KeyError: 'A2_S2'
similar to issue #232