Closed NHoang98 closed 3 weeks ago
The error says this read accession is not present in the fastq file. So somehow the clustered read is not found in the fastq file.
3.filtering
naming which looks suspicious :) Perhaps you filtered out reads from the A1.fq
after clustering - then you will get this error.Thanks for fast reply,
Well, the directory 3.filtering
was actually for filtered reads after pychopper step (we discard <100bp read by Filtlong package). The .fq
output from Filtlong is the input for inONclust
. Do you think the way Filtlong produces output might lead to this consequence?
hmm, I don't think so. If the read file given as input to the clustering is the same as the read file given as input to the write_fastq
, and the final_clusters.tsv
is produced using the same read file as for the clustering, then I don't see how this error can happen.
Hi @ksahlin,
Sorry for bringing this back again after a long time. At the end of the day, we still find that isONclust
is the best choice for our experiment. This time we tried the sorted.fastq
in isONclust output folder instead of the input .fq
and the error is still there. So I think there might be a problem with my .fq header? I've run the sample_alz_2k.fastq
file multiple times and it seems normal. But since your test file look like from pacbio ccs run, I didn't know my fastq headers are in the right format
This is example of one of the header that i copied from the sorted.fastq
file
@b1c4fdcc-1f64-48d9-9e27-b6d9ed152198_st:Z:2024-07-19T12:03:20.721+00:00 RG:Z:ee6fa1023bdc36c23924a004f40565b31c16f1c6_dna_r9.4.1_e8_sup@v3.6_SQK-PCB111-24_barcode01_39652.62084870732
fyi: when I compare the final_cluster_origins.tsv
between my run and the test run, it looks like my header accidentally split into 2 columns. The pictures are attached right below:
Hope to hear great news from you soon!
How about modifying the accessions of all reads before clustering, e.g. by moving everything after the first underscore with: line[1:].split('_')[0]
to only get the b1c4fdcc-1f64-48d9-9e27-b6d9ed152198
part. Maybe this helps.
Also, we're about to release isONclust3 any time now (likely within a week or two). Let me know if you're interesting trying this tool out and we can arrange access before the release.
I have the same idea, we renamed and subsampled (around 10k reads) with seqkit
. Things run smoothly just like butter!
About the new version, glad to hear about the release! Our data needs like 2-3 weeks to be analyzed by the original version and I think that the time is fit with the release date of the new version. At that point, we are happy to try it on our data!
Okay, we'll let you know when it is released.
2-3 weeks for getting results sounds quite bad. I dare to bet that you'll be able to see more than 10x speedup with isONclust3.
CC @aljpetri
Hi I have now set the code repository for isONclust3 to public. The Code can be found via: https://github.com/aljpetri/isONclust3 Please let us know how testing the tool worked out for you.
Glad to hear about the release! Could you please update the usage in the new version repository? We will try on our data and feedback as soon as we can! Also I'll close this issue since it has been solved!
Hello, Firstly, thank you for the package! We are currently trying the tool for our transcriptomic data set. The tool works perfectly fine in clustering mode but we encountered an error when extracting clustered fastq (
write_fastq
).For detail: The data was clustered by using:
isONclust --t 20 --ont --fastq Documents/Gac/RNAseq/3.filtering/Aril/A1.fq --outfolder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/cluster/A1
isONclust write_fastq --N 1 --clusters Documents/Gac/RNAseq/4.mapping/isONclust/Aril/cluster/A1/final_clusters.tsv --fastq Documents/Gac/RNAseq/3.filtering/Aril/A1.fq --outfolder Documents/Gac/RNAseq/4.mapping/isONclust/Aril/cluster/A1/fastq_files
And then the error returned shortly after the 2nd command was run: Traceback (most recent call last): File "/home/cmmr/anaconda3/bin/isONclust", line 217, in
write_fastq(args)
File "/home/cmmr/anaconda3/bin/isONclust", line 164, in write_fastq
seq, qual = reads[acc]