ksahlin / isONclust

De novo clustering of long transcript reads into genes
GNU General Public License v3.0
47 stars 8 forks source link

get_sorted_fastq_for_cluster.py, line 124 // p_no_error_in_kmers = 1.0 - exp_errors_in_kmers/ float((len(seq) - k +1)) #3

Closed uqvirg closed 5 years ago

uqvirg commented 5 years ago

Hi, I've run isONclust with the test: " isONclust --fastq test/sample_alz_2k.fastq --outfolder output ", it was ok. After that I've tried to run isONclust on a sugarcane fastq file, read of insert of 301 020 reads. isONclust --fastq reads_of_insert.fastq --outfolder output1

and I had the error: " p_no_error_in_kmers = 1.0 - exp_errors_in_kmers/ float((len(seq) - k +1)) ZeroDivisionError: float division by zero".

Do you know what could be the problem ? Thank you, Virgg

------------------------ Full LOG ---------------------------- isONclust --fastq /30days/uqvperlo/smrtanalysis/output/merge_data/reads_of_insert.fastq --outfolder output1 started sorting seqs 0 reads processed. 10000 reads processed. 20000 reads processed. 30000 reads processed. 40000 reads processed. 50000 reads processed. 60000 reads processed. 70000 reads processed. 80000 reads processed. 90000 reads processed. Traceback (most recent call last): File "/home/uqvperlo/.conda/envs/isonclust/bin/isONclust", line 178, in main(args) File "/home/uqvperlo/.conda/envs/isonclust/bin/isONclust", line 67, in main sorted_reads_fastq_file = get_sorted_fastq_for_cluster.main(args) File "/home/uqvperlo/.conda/envs/isonclust/lib/python3.7/site-packages/modules/get_sorted_fastq_for_cluster.py", line 124, in main p_no_error_in_kmers = 1.0 - exp_errors_in_kmers/ float((len(seq) - k +1)) ZeroDivisionError: float division by zero

ksahlin commented 5 years ago

oh, this looks like an easy fix. It seems that one of your sequences is shorter than the default size of the --w parameter, which is 50. I will fix this bug when I'm back to work in about a week.

In the meantime, you can either lower the parameter --w to one less than the shortest sequence in your input file (but not lower than 15 which is the size of --k), or filter your input fastq to only contain sequences longer than 50 bases.

uqvirg commented 5 years ago

Thank you, It works well (I have currently more that twice isoform than with IsoCon, I'm working on it). Thank you again, Virgg

ksahlin commented 5 years ago

Glad you find it useful! Unsure how you want to use isONclust but it only clusters the reads for you. No error correction is performed to remove reads from the same transcript. Maybe you have a downstream step to take that into account.

Best, K

ksahlin commented 5 years ago

This error should be fixed in v0.0.4.