Open mscharmann opened 10 months ago
Hi, thank you for using KMC and for reporting this issue. I guess something is wrong with handling long sequences in kmc_tools. I will try to take a look. Would be really helpful if you could share some of your input files causing this.
Hello, first of all, thank you for giving us KMC and kmc_tools, which I use frequently. Now I am trying to retrieve contigs from a genome assembly which contain kmers from a database using kmc_tools filter (ver. 3.2.1, 2022-01-04). The input to kmc_tools filter is thus in fasta format. Multiple fasta records are in the file (hundreds/thousands) but each sequence is on a single line, not "wrapped" / multi-line. Some sequences are >10 mega-bases or 100 mega-bases long, and the entire fasta file is >1 Gb in size. The input file parameter -fa (nor the undocumented -fm) does not behave as the help message suggests... I always get an
"Error: Wrong input file!"
Edit: this seems to be specific to the very long sequences in both FASTA and FASTQ format; the command succeeds when the sequences therein are only tens of kb long. Faking my genome contigs into FASTQ format does not help.
Many thanks and best regards, Mathias