sreeramkannan / Shannon

RNA-Seq
24 stars 13 forks source link

Quorum chokes on quality scores #17

Open standage opened 7 years ago

standage commented 7 years ago

I'm not sure what to make of this error. Please advise.

Found an unusual minimum quality char of 39 ('). Stopping now. Use option -m to overrideTraceback (most recent call last):
  File "/scratch/standage/Shannon/run_quorum.py", line 106, in <module>
    with open(base_file + ".fa", 'r') as quorum_output: 
IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/quorum_output.fa'
Traceback (most recent call last):
  File "rc_s.py", line 35, in <module>
    main()
  File "rc_s.py", line 30, in main
    reverse_complement_serial(infile, outfile)
  File "rc_s.py", line 10, in reverse_complement_serial
    for line in open(infile):
IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa'
--------------------------------------------
Shannon: RNA Seq de novo Assembly
Version: 0.0.2
--------------------------------------------
Checking the various dependencies
--------------------------------------------
Using jellyfish in /usr/local/bin/jellyfish
Using GPMETIS in /usr/bin/gpmetis
OPTIONS: File extension detected as fastq.
--------------------------------------------
Tue Jan 17 14:18:55 2017: Starting Shannon run..
Tue Jan 17 14:18:55 2017: Running Quorum for read error correction with quality scores..
Traceback (most recent call last):
  File "shannon.py", line 415, in <module>
    (N,L) = rc_gnu.rc_gnu(reads_files[0],temp_read_file_1,rc_read_file_1,nJobs)
  File "/scratch/standage/Shannon/rc_gnu.py", line 27, in rc_gnu
    return find_L(infile)
  File "/scratch/standage/Shannon/rc_gnu.py", line 17, in find_L
    for line in open(readfile):
IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa'
sreeramkannan commented 7 years ago

Thanks for trying out Shannon.

Quorum seems to suffer from an inability to detect the PHRED scores in some FASTQ files (especially when run after trimming). I will release an update that deals with it automatically.

For now, for your file, I suggest you to replace in run_quorum.py the line run_cmd(quorum_path + jobs_string + " --prefix " + base_file + " " + reads_files[0]) with run_cmd(quorum_path + jobs_string + " -q 33 --prefix " + base_file + " " + reads_files[0])

and run_cmd(quorum_path + jobs_string + " --prefix " + base_file + " " + new_reads1_file + " " + new_reads2_file) with run_cmd(quorum_path + jobs_string + " -q 33 --prefix " + base_file + " " + new_reads1_file + " " + new_reads2_file)

So please replace the file run_quorum.py in the shannon directory with this one.

Please let me know if you are able to run Shannon with this edit.

Best

On Tue, Jan 17, 2017 at 11:36 AM, Daniel Standage notifications@github.com wrote:

I'm not sure what to make of this error. Please advise.

Found an unusual minimum quality char of 39 ('). Stopping now. Use option -m to overrideTraceback (most recent call last): File "/scratch/standage/Shannon/run_quorum.py", line 106, in with open(base_file + ".fa", 'r') as quorum_output: IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/quorum_output.fa' Traceback (most recent call last): File "rc_s.py", line 35, in main() File "rc_s.py", line 30, in main reverse_complement_serial(infile, outfile) File "rc_s.py", line 10, in reverse_complement_serial for line in open(infile): IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa'

Shannon: RNA Seq de novo Assembly Version: 0.0.2

Checking the various dependencies

Using jellyfish in /usr/local/bin/jellyfish Using GPMETIS in /usr/bin/gpmetis OPTIONS: File extension detected as fastq.

Tue Jan 17 14:18:55 2017: Starting Shannon run.. Tue Jan 17 14:18:55 2017: Running Quorum for read error correction with quality scores.. Traceback (most recent call last): File "shannon.py", line 415, in (N,L) = rc_gnu.rc_gnu(reads_files[0],temp_read_file_1,rc_read_file_1,nJobs) File "/scratch/standage/Shannon/rc_gnu.py", line 27, in rc_gnu return find_L(infile) File "/scratch/standage/Shannon/rc_gnu.py", line 17, in find_L for line in open(readfile): IOError: [Errno 2] No such file or directory: '/scratch/standage/shannon_2017-01-17/corrected_reads_1.fa'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm239hF305ohMf8OJROwuqJiAlsH-Teks5rTRhEgaJpZM4LmEkk .

standage commented 7 years ago

Thanks Sreeram, things seem to be running now.

standage commented 7 years ago

The quorum output included only 12 reads, and the quorum log includes 10s of megabytes of messages saying "Skipped : No high quality mer". :-(

sreeramkannan commented 7 years ago

This is related the the quality score encoding, which we hard-coded as Phred+33. I am not sure if something is going wrong there.

Is it possible to share like 10k or so reads so I can figure out what is going on?

I appreciate the time you are taking to test Shannon, and your feedback is valuable in developing Shannon.

On Jan 19, 2017 4:12 PM, "Daniel Standage" notifications@github.com wrote:

The quorum output included only 12 reads, and the quorum log includes 10s of megabytes of messages saying "Skipped : No high quality mer". :-(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-273939428, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23354GOxEK0mooKduSTmv11kFt9I7ks5rT_vkgaJpZM4LmEkk .

standage commented 7 years ago

all.trim.50k.1.fq.gz all.trim.50k.2.fq.gz

It is my pleasure. My research and my software has benefitted greatly by others that were generous with their time in reporting issues, and I'm happy to pay it forward. Of course, there are selfish reasons as well: the better your software is, the more useful it is to me! :)

sreeramkannan commented 7 years ago

I think the error was due to read names not being unique. Once I rename the reads in the input file, quorum and Shannon runs fine.

Thanks and sorry for the delay! Sreeram

On Fri, Jan 20, 2017 at 5:14 PM, Daniel Standage notifications@github.com wrote:

all.trim.50k.1.fq.gz https://github.com/sreeramkannan/Shannon/files/720811/all.trim.50k.1.fq.gz all.trim.50k.2.fq.gz https://github.com/sreeramkannan/Shannon/files/720812/all.trim.50k.2.fq.gz

It is my pleasure. My research and my software has benefitted greatly by others that were generous with their time in reporting issues, and I'm happy to pay it forward. Of course, there are selfish reasons as well: the better your software is, the more useful it is to me! :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-274220303, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23-mgUODTOSKxDye6_Hsd94SPePbPks5rUVvZgaJpZM4LmEkk .

standage commented 7 years ago

I ran into this problem recently with another data set. By default, fastq-dump from SRA toolkit gives the same read names for left and right pairs of a read, but you can use the --defline-seq and --defline-qual options to append /1 and /2 or something like that to distinguish left and right pairs and make sure each sequence has a unique ID.

I'll give it a try!

sreeramkannan commented 7 years ago

Ah, ok. Thanks for the suggestion. I will also make sure there is an update that deals with it internally.

On Wed, Feb 1, 2017 at 2:38 PM, Daniel Standage notifications@github.com wrote:

I ran into this problem recently with another data set. By default, fastq-dump from SRA toolkit gives the same read names for left and right pairs of a read, but you can use the --defline-seq and --defline-qual options to append /1 and /2 or something like that to distinguish left and right pairs and make sure each sequence has a unique ID.

I'll give it a try!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sreeramkannan/Shannon/issues/17#issuecomment-276806137, or mute the thread https://github.com/notifications/unsubscribe-auth/ANm23y8jqX-xQLgb4y2xqNJTVsT1SW5Vks5rYQlegaJpZM4LmEkk .

jdmontenegro commented 7 years ago

Hi guys,

I also have just hit this error:

Found an unusual minimum quality char of 46 (.). Stopping now. Use option -m to overrideTraceback (most recent call last):
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/run_quorum.py", line 106, in <module>
    with open(base_file + ".fa", 'r') as quorum_output:
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/quorum_output.fa'
Traceback (most recent call last):
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 35, in <module>
    main()
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 30, in main
    reverse_complement_serial(infile, outfile)
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_s.py", line 10, in reverse_complement_serial
    for line in open(infile):
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/corrected_reads_1.fa'
Traceback (most recent call last):
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/shannon.py", line 415, in <module>
    (N,L) = rc_gnu.rc_gnu(reads_files[0],temp_read_file_1,rc_read_file_1,nJobs,python_path,shannon_dir)
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_gnu.py", line 27, in rc_gnu
    return find_L(infile)
  File "/data/Bioinfo/bioinfo-proj-jmontenegro/Programs/Shannon/rc_gnu.py", line 17, in find_L
    for line in open(readfile):
IOError: [Errno 2] No such file or directory: '/data/Bioinfo/bioinfo-proj-jmontenegro/RNA-seq/ASSEMBLY/CAGRF15949/Results/Assembly/Shannon/corrected_reads_1.fa'

Changing the names of the reads to ensure these are unique (_1 and _2) did not solve the problem. Is there any workaround for this issue?

Cheers,