refresh-bio / KMC

Fast and frugal disk based k-mer counter
252 stars 73 forks source link

Small suggestion #217

Open KBT59 opened 12 months ago

KBT59 commented 12 months ago

Kmc runs but gives all zeros for output when run on fastq headers with white space like

@alph-63_S41_R1_001:2/1 cD:i:1 cE:f:0.000000 cM:i:0

If I first strip headers to contain no whitespace (this takes more time than kmer counting does) then the problem goes away. Can the program be made to tolerate headers with whitespace?

marekkokot commented 12 months ago

Hi, hmm this is surprising. Could you please upload the smallest possible example input file causing this behavior and the full command line you have used? I thought I fixed something like this some time ago... but maybe there is still a problem.

KBT59 commented 12 months ago

Here is where it works (no whitespace in ToSendOut.fastq):

bin/kmc -k32 ToSendOut.fastq 32kmer1 .

** Stage 1: 100% Stage 2: 100% 1st stage: 0.958199s 2nd stage: 0.764645s Total : 1.72284s Tmp size : 0MB

Stats: No. of k-mers below min. threshold : 12 No. of k-mers above max. threshold : 0 No. of unique k-mers : 18 No. of unique counted k-mers : 6 Total no. of k-mers : 57 Total no. of reads : 1 Total no. of super-k-mers : 2

##################################################################### Here is where it is not working (ToSendOut2.fastq has whitespaces):

bin/kmc -k32 ToSendOut2.fastq 32kmer2 .

** Stage 1: 100% Stage 2: 100% 1st stage: 0.814573s 2nd stage: 0.736827s Total : 1.5514s Tmp size : 0MB

Stats: No. of k-mers below min. threshold : 0 No. of k-mers above max. threshold : 0 No. of unique k-mers : 0 No. of unique counted k-mers : 0 Total no. of k-mers : 0 Total no. of reads : 1 Total no. of super-k-mers : 0

Thanks, Brad Thomas

From: marekkokot @.> Sent: Friday, July 7, 2023 3:10 PM To: refresh-bio/KMC @.> Cc: Brad Thomas @.>; Author @.> Subject: [EXTERNAL] Re: [refresh-bio/KMC] Small suggestion (Issue #217)

CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.


Hi, hmm this is surprising. Could you please upload the smallest possible example input file causing this behavior and the full command line you have used? I thought I fixed something like this some time ago... but maybe there is still a problem.

— Reply to this email directly, view it on GitHubhttps://github.com/refresh-bio/KMC/issues/217#issuecomment-1626012264, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIT5ZKM4J6BJMFR5AEV65OTXPBUI5ANCNFSM6AAAAAA2CBGRRQ. You are receiving this because you authored the thread.Message ID: @.***>

This communication and its attachments contain confidential information and is intended only for the named addressee. If you are not the named addressee you should not disseminate, distribute or copy this communication. Please notify the sender immediately if you have received this communication by mistake and delete or destroy this communication. Communications cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this communication which arise as a result of transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, 9490 NeoGenomics Way, Fort Myers, FL 33912, http://www.neogenomics.com (2022)

marekkokot commented 12 months ago

Hi,

I cannot see the attachments. I created the input myself, like this:

@alph-63_S41_R1_001:2/1 cD:i:1 cE:f:0.000000 cM:i:0
GACTACTACATCTATGCTATCATCTGATGCTGAGCTGATGCATGTCATGCTA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

but it seems to give ok results:

./kmc -k32 test.fq o .
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.65277s
2nd stage: 1.55719s
Total    : 2.20996s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :           21
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :           21
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :           21
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            3

I have been using K-Mer Counter (KMC) ver. 3.2.2 (2023-03-10) Are you using the same version? Could you please re-upload your example input (I think the issue is that you responded to this in an e-mail, please respond directly on github).

Best Marek