Open standage opened 7 years ago
Thanks for trying out Shannon.
Quorum seems to suffer from an inability to detect the PHRED scores in some FASTQ files (especially when run after trimming). I will release an update that deals with it automatically.
For now, for your file, I suggest you to replace in run_quorum.py the line run_cmd(quorum_path + jobs_string + " --prefix " + base_file + " " + reads_files[0]) with run_cmd(quorum_path + jobs_string + " -q 33 --prefix " + base_file + " " + reads_files[0])
and run_cmd(quorum_path + jobs_string + " --prefix " + base_file + " " + new_reads1_file + " " + new_reads2_file) with run_cmd(quorum_path + jobs_string + " -q 33 --prefix " + base_file + " " + new_reads1_file + " " + new_reads2_file)
So please replace the file run_quorum.py in the shannon directory with this one.
Please let me know if you are able to run Shannon with this edit.
Best
On Tue, Jan 17, 2017 at 11:36 AM, Daniel Standage notifications@github.com wrote:
I'm not sure what to make of this error. Please advise.
Found an unusual minimum quality char of 39 ('). Stopping now. Use option -m to overrideTraceback (most recent call last): File "/scratch/standage/Shannon/run_quorum.py", line 106, in
with open(base_file + ".fa", 'r') as quorum_output: IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/quorum_output.fa' Traceback (most recent call last): File "rc_s.py", line 35, in main() File "rc_s.py", line 30, in main reverse_complement_serial(infile, outfile) File "rc_s.py", line 10, in reverse_complement_serial for line in open(infile): IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa' Shannon: RNA Seq de novo Assembly Version: 0.0.2
Checking the various dependencies
Using jellyfish in /usr/local/bin/jellyfish Using GPMETIS in /usr/bin/gpmetis OPTIONS: File extension detected as fastq.
Tue Jan 17 14:18:55 2017: Starting Shannon run.. Tue Jan 17 14:18:55 2017: Running Quorum for read error correction with quality scores.. Traceback (most recent call last): File "shannon.py", line 415, in
(N,L) = rc_gnu.rc_gnu(reads_files[0],temp_read_file_1,rc_read_file_1,nJobs) File "/scratch/standage/Shannon/rc_gnu.py", line 27, in rc_gnu return find_L(infile) File "/scratch/standage/Shannon/rc_gnu.py", line 17, in find_L for line in open(readfile): IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm239hF305ohMf8OJROwuqJiAlsH-Teks5rTRhEgaJpZM4LmEkk .
Thanks Sreeram, things seem to be running now.
The quorum output included only 12 reads, and the quorum log includes 10s of megabytes of messages saying "Skipped
This is related the the quality score encoding, which we hard-coded as Phred+33. I am not sure if something is going wrong there.
Is it possible to share like 10k or so reads so I can figure out what is going on?
I appreciate the time you are taking to test Shannon, and your feedback is valuable in developing Shannon.
On Jan 19, 2017 4:12 PM, "Daniel Standage" notifications@github.com wrote:
The quorum output included only 12 reads, and the quorum log includes 10s of megabytes of messages saying "Skipped : No high quality mer". :-(
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-273939428, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23354GOxEK0mooKduSTmv11kFt9I7ks5rT_vkgaJpZM4LmEkk .
all.trim.50k.1.fq.gz all.trim.50k.2.fq.gz
It is my pleasure. My research and my software has benefitted greatly by others that were generous with their time in reporting issues, and I'm happy to pay it forward. Of course, there are selfish reasons as well: the better your software is, the more useful it is to me! :)
I think the error was due to read names not being unique. Once I rename the reads in the input file, quorum and Shannon runs fine.
Thanks and sorry for the delay! Sreeram
On Fri, Jan 20, 2017 at 5:14 PM, Daniel Standage notifications@github.com wrote:
all.trim.50k.1.fq.gz https://github.com/sreeramkannan/Shannon/files/720811/all.trim.50k.1.fq.gz all.trim.50k.2.fq.gz https://github.com/sreeramkannan/Shannon/files/720812/all.trim.50k.2.fq.gz
It is my pleasure. My research and my software has benefitted greatly by others that were generous with their time in reporting issues, and I'm happy to pay it forward. Of course, there are selfish reasons as well: the better your software is, the more useful it is to me! :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-274220303, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23-mgUODTOSKxDye6_Hsd94SPePbPks5rUVvZgaJpZM4LmEkk .
I ran into this problem recently with another data set. By default, fastq-dump
from SRA toolkit gives the same read names for left and right pairs of a read, but you can use the --defline-seq
and --defline-qual
options to append /1
and /2
or something like that to distinguish left and right pairs and make sure each sequence has a unique ID.
I'll give it a try!
Ah, ok. Thanks for the suggestion. I will also make sure there is an update that deals with it internally.
On Wed, Feb 1, 2017 at 2:38 PM, Daniel Standage notifications@github.com wrote:
I ran into this problem recently with another data set. By default, fastq-dump from SRA toolkit gives the same read names for left and right pairs of a read, but you can use the --defline-seq and --defline-qual options to append /1 and /2 or something like that to distinguish left and right pairs and make sure each sequence has a unique ID.
I'll give it a try!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-276806137, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23y8jqX-xQLgb4y2xqNJTVsT1SW5Vks5rYQlegaJpZM4LmEkk .
Hi guys,
I also have just hit this error:
Found an unusual minimum quality char of 46 (.). Stopping now. Use option -m to overrideTraceback (most recent call last):
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/run_quorum.py", line 106, in <module>
with open(base_file + ".fa", 'r') as quorum_output:
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/quorum_output.fa'
Traceback (most recent call last):
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 35, in <module>
main()
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 30, in main
reverse_complement_serial(infile, outfile)
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 10, in reverse_complement_serial
for line in open(infile):
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/corrected_reads_1.fa'
Traceback (most recent call last):
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/shannon.py", line 415, in <module>
(N,L) = rc_gnu.rc_gnu(reads_files[0],temp_read_file_1,rc_read_file_1,nJobs,python_path,shannon_dir)
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_gnu.py", line 27, in rc_gnu
return find_L(infile)
File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_gnu.py", line 17, in find_L
for line in open(readfile):
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/corrected_reads_1.fa'
Changing the names of the reads to ensure these are unique (_1 and _2) did not solve the problem. Is there any workaround for this issue?
Cheers,
I'm not sure what to make of this error. Please advise.