Open chequochuu opened 5 years ago
Hi,
I am not able to reproduce this bug.
By latest kmc code you mean that you compile commit 85ad76956d890aa24fc8525eee5653078ed86ace?
Could you rerun it with -v switch and send me your output?
Could you try to rerun it with -t1 and check if it still does not work?
Yes, I use that commit. Still got error.
Info: Small k optimization on!
******* configuration for small k mode: *******
No. of input files : 1
Output file name : res
Input format : FASTQ
k-mer length : 3
Max. k-mer length : 256
Min. count threshold : 1
Max. count threshold : 1000000000
Max. counter value : 255
Both strands : true
Input buffer size : 33554432
No. of readers : 1
No. of splitters : 1
Max. mem. size : 5000MB
Max. mem. for PMM (FASTQ) : 3294MB
Part. mem. for PMM (FASTQ) : 33MB
Max. mem. for PMM (reads) : 1MB
Part. mem. for PMM (reads) : 0MB
Max. mem. for PMM (b. reader): 402MB
Part. mem. for PMM (b. reader): 134MB
Stage 1: 100%
1st stage: 0.000247s
2nd stage: 6.3e-05s
Total : 0.00031s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 0
No. of unique counted k-mers : 0
Total no. of k-mers : 0
Total no. of reads : 1
Total no. of super-k-mers : 0
It seem that it doesn't work when reading with barcode included in the read name. When I remove the barcode:
@0|Chromosome|4051100|4051286/2
AAACCCAACCAC
+
FFFFFFFFFFFF
It works like a charm!
Hmmm, it is still weird, that it worked on my machine. Maybe I have prepared input file other then yours. Could you maybe send me your file r1_test.fq ?
This is all my r1_test.fq
@0|Chromosome|4051100|4051286/2 BX:Z:CGACACGGTTTGGGCC
AAACCCAACCAC
+
FFFFFFFFFFFF
Hi, I ment send me a file not its content, because maybe github remove something when you copy paste. It seems unlikely, but currently, I cannot imagine another reason why it works on my machine.
You may also copy what you have pasted here to a new file and check if KMC still produces wrong results on your machine.
I have find out that the character between id and barcode is \t instead of space. Sorry, my bad.
Ok, thanks for the info. It seems it is the same bug as #42, so I will keep it open to remember to add '\t' support. Anyway, thanks for reporting that issue and thanks for using KMC.
Bump! I ran into the same issue as of today. Would be cool to have it fixed, especially given that many linked-read pipelines produce tabbed headers by default.
The new versions of Nanopore's Dorado and related tools also produce tabbed headers in their fastq files, so I would also appreciate a fix :)
I also ran into this issue with fastq files generated by Dorado v0.7.0 which have tabs in the headers. For now I used seqkit replace to change tabs into spaces as a workaround but it would be nice if kmc could handle tabbed fastq headers.
I using the latest kmc code but i can't count kmer on fastq file. It work on fasta