TypeError occured when running isONcorrect

defendant602 commented 4 years ago

Hi, Thanks for developing this great software for reads correction. I have run isONcorrect pipeline on my own ONT cDNA data, but I got the following error:

subprocess.CalledProcessError: Command '['/usr/bin/time', '/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/isONcorrect', '--fastq', './clustering/fastq_files/1823.fastq', '--outfolder', './correction/1823', '--exact_instance_limit', '50', '--set_w_dynamically', '--k', '9', '--w', '10', '--xmin', '14', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.

It run well on other clusters, but it went wrong on 1823.fastq. Whe I run this command alone, the error info is:

Too abundant: TATATATAT ACACATATA 13 12 Too abundant: ATATATACA TATATATAT 13 12 Too abundant: ATATACACA TATATATAT 13 12 Too abundant: CACTCCAGC AAAAAAAAA 13 12 Too abundant: ACTCCAGCC AAAAAAAAA 13 12 Average abundance for non-unique minimizer-combs: 3.2074067588863597 Number of singleton minimizer combinations filtered out: 90418 Traceback (most recent call last): File "/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/isONcorrect", line 1098, in main(args) File "/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/isONcorrect", line 991, in main corrected_seq, other_reads_corrected_regions = correct_read(seq, reads, intervals_to_correct, k_size, work_dir, v_depth_ratio_threshold, max_seqs_to_spoa, args.disable_numpy, args.verbose) File "/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/isONcorrect", line 774, in correct_read best_corr, other_corrections = get_best_corrections(instance, reads, k_size, work_dir, v_depth_ratio_threshold, max_seqs_to_spoa, disable_numpy) # store all corrected regions within all reads in large container and keep track when correcting new read to not re-compute these regions
File "/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/isONcorrect", line 467, in get_best_corrections read_alignment, ref_alignment = help_functions.cigar_to_seq(cigar_string, seq, spoa_ref) File "/export/pipeline/RNASeq/Software/isONcorrect/isONcorrect-master/modules/help_functions.py", line 43, in cigar_to_seq result = re.split(r'[=DXSMI]+', cigar) File "/export/software/Base/python/Python/Python-3.6.3/lib/python3.6/re.py", line 212, in split return _compile(pattern, flags).split(string, maxsplit) TypeError: expected string or bytes-like object 1.99user 4.93system 0:01.79elapsed 386%CPU (0avgtext+0avgdata 87308maxresident)k 0inputs+392outputs (44major+62262minor)pagefaults 0swaps

In code <read_alignment, ref_alignment = help_functions.cigar_to_seq(cigar_string, seq, spoa_ref)>, cigar_string is actually None type, rather than a string.

Could you please take a moment to check this error out? Thanks very much!

ksahlin commented 4 years ago

Do you have the possibility to share with me the ./clustering/fastq_files/1823.fastq file (e.g. by sending it to me by email)? In that case, I can take a look at it right away.

defendant602 commented 4 years ago

Thanks for the quick reply! It's absolutely ok to share the fastq file with you, may I have your email address?

ksahlin commented 4 years ago

ksahlin [at] kth [dot] se. The address should show if you click on my profile.

ksahlin commented 4 years ago

If too large we could find another medium to share files. Just let me know.

defendant602 commented 4 years ago

1823.zip

The 1823.fastq is actually very small in file size, I think it won't be necessary to send it to you by email. I have uploaded it as an attachment file in this comment.

Thanks!

ksahlin commented 4 years ago

I have maybe fixed the bug now in the new version v0.0.5. The new version is available on pip and here on GitHub in latest master commit 563d0a2.

The reason I say "maybe" is that I didn't observe an identical runtime error that you reported for this instance. However, I did get a runtime error in the same region of the code, which is now fixed. So this is somewhat strange.

However, give the new version a try and see if it works also for you. In addition, the fix could slightly improve accuracy, so you might want to rerun it on all your data (although it should be a very minor improvement in that case).

Thanks for reporting and let me know if it solves the issue!

For logging purposes, my error was:

 python -m pyinstrument isONcorrect --fastq /Users/kxs624/tmp/ISONCORRECT/user_bug1/fastq/1823.fastq --outfolder /Users/kxs624/tmp/ISONCORRECT/user_bug1/out/ --verbose

Traceback (most recent call last):
  File "/Users/kxs624/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/kxs624/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/kxs624/anaconda3/lib/python3.6/site-packages/pyinstrument/__main__.py", line 156, in <module>
    main()
  File "/Users/kxs624/anaconda3/lib/python3.6/site-packages/pyinstrument/__main__.py", line 87, in main
    exec_(code, globs, None)
  File "isONcorrect", line 1098, in <module>
    main(args)
  File "isONcorrect", line 991, in main
    corrected_seq, other_reads_corrected_regions = correct_read(seq, reads, intervals_to_correct, k_size, work_dir, v_depth_ratio_threshold, max_seqs_to_spoa, args.disable_numpy, args.verbose)
  File "isONcorrect", line 774, in correct_read
    best_corr, other_corrections = get_best_corrections(instance, reads, k_size, work_dir, v_depth_ratio_threshold, max_seqs_to_spoa, disable_numpy) # store all corrected regions within all reads in large container and keep track when correcting new read to not re-compute these regions     
  File "isONcorrect", line 567, in get_best_corrections
    return curr_read_corr[k_size:-k_size], other_corrections_final
UnboundLocalError: local variable 'curr_read_corr' referenced before assignment

defendant602 commented 4 years ago

Yes, it solved my problem and it runs well ! Thanks again!

ksahlin / isONcorrect

TypeError occured when running isONcorrect #2