refresh-bio / KMC

Fast and frugal disk based k-mer counter
253 stars 73 forks source link

Couldn't run KMC after troubleshooting with issue #176 #187

Closed naahraissa closed 2 years ago

naahraissa commented 2 years ago

Hi guys,

Thanks for developing KMC,

I tried to count Kmers from PacBio ccc.fasta raw reads and even the assembly reads. Each time after running the option there is a collapse. i tried to follow troubleshooting methods of issue #176, the whole day trying to figure out but couldn't succeed to run the software even after following the example. Actually i downloaded using conda install -c bioconda kmc.

Please guys help,

Thanks,

Raissa

Usage: kmc [options] kmc [options] <@input_file_names> Parameters: input_file_name - single file in specified (-f switch) format (gziped or not) @input_file_names - file name with list of input files in specified (-f switch ) format (gziped or not) Options: -v - verbose mode (shows all parameter settings); default: false -k - k-mer length (k from 1 to 256; default: 25) -m - max amount of RAM in GB (from 1 to 1024); default: 12 -sm - use strict memory mode (memory limit from -m switch will not be excee ded) -hc - count homopolymer compressed k-mers (approximate and experimental) -p - signature length (5, 6, 7, 8, 9, 10, 11); default: 9 -f<a/q/m/bam/kmc> - input in FASTA format (-fa), FASTQ format (-fq), multi FAS TA (-fm) or BAM (-fbam) or KMC(-fkmc); default: FASTQ -ci - exclude k-mers occurring less than times (default: 2) -cs - maximal value of a counter (default: 255) -cx - exclude k-mers occurring more of than times (default: 1e9 ) -b - turn off transformation of k-mers into canonical form -r - turn on RAM-only mode -n - number of bins -t - total number of threads (default: no. of CPU cores) -sf - number of FASTQ reading threads -sp - number of splitting threads -sr - number of threads for 2nd stage -j - file name with execution summary in JSON format -w - without output -o<kmc/kff> - output in KMC of KFF format; default: KMC -hp - hide percentage progress (default: false) -e - only estimate histogram of k-mers occurrences instead of exact k-mer coun ting --opt-out-size - optimize output database size (may increase running time) Example: kmc -k27 -m24 NA19238.fastq NA.res /data/kmc_tmp_dir/ kmc -k27 -m24 @files.lst NA.res /data/kmc_tmp_dir/

kmc -k21 -ci1 -t40 -fm @files.lst /data/kmc_tmp_dir/

marekkokot commented 2 years ago

Hi, I will try to check it tomorrow. Could you please share your dataset or the smallest part of it causing the issue?

naahraissa commented 2 years ago

Thanks Marekkokot for your prompt intervention,

seems the file size is too large to be attached 8GB. Even the primary contig file i assebled using Hifiam 1Gb couldn't attach in the gz. file format. could the large file size be the reason why KMC couldn't count?. i really need your help,

Thanks

marekkokot commented 2 years ago

Hi,

the file size should not be an issue. We were using KMC to count k-mers for input datasets of more than 1TB size. Isn't this dataset available online, maybe you could just post a link here? Have you succeeded running KMC on some other data set, for example on hand prepared one containing only one fasta/fastq sequence? Alternatively, maybe you could just trim a couple of first input sequences from your input dataset and try to run KMC on it. Maybe the issue will persist and this file you could send. What is your hardware environment (mainly the amount of RAM, but also CPU, and how much free disk space you have)? Also what OS you are running (Windows, Linux, Mac)?

naahraissa commented 2 years ago

Hi Mer,

Sorry for reaching back late. Actually in the last couple of days we had a problem with our server. It’s was really sad for me because I couldn’t work for the last couple of days,

I had to segment the file to easy upload. Please I uploaded 10kb file

Please Mer, your intervention would be an upmost help,

Thanks,

Raissa

From: marekkokot @.> Sent: Friday, May 6, 2022 2:33 AM To: refresh-bio/KMC @.> Cc: Na-ah, Raissa Fon @.>; Author @.> Subject: Re: [refresh-bio/KMC] Couldn't run KMC after troubleshooting with issue #176 (Issue #187)

Hi,

the file size should not be an issue. We were using KMC to count k-mers for input datasets of more than 1TB size. Isn't this dataset available online, maybe you could just post a link here? Have you succeeded running KMC on some other data set, for example on hand prepared one containing only one fasta/fastq sequence? Alternatively, maybe you could just trim a couple of first input sequences from your input dataset and try to run KMC on it. Maybe the issue will persist and this file you could send. What is your hardware environment (mainly the amount of RAM, but also CPU, and how much free disk space you have)? Also what OS you are running (Windows, Linux, Mac)?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/refresh-bio/KMC/issues/187*issuecomment-1119337962__;Iw!!DZ3fjg!7GNRa9UaLaL_poA5_yaZWqFJ247szdPHIed5pZnSZ1962M-lOgbwUGk97VDDZrlR9Gq9Ade-UOHdLDmWBMaueKRh$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AXF6UKMFGSQF5QYT3H5ZSGTVITDLPANCNFSM5VGISBMA__;!!DZ3fjg!7GNRa9UaLaL_poA5_yaZWqFJ247szdPHIed5pZnSZ1962M-lOgbwUGk97VDDZrlR9Gq9Ade-UOHdLDmWBDH5qOTe$. You are receiving this because you authored the thread.Message ID: @.**@.>>

marekkokot commented 2 years ago

Hello,

issues with hardware do happen sometimes, so don't worry. I can't see your 10kb file. Could you also answer my other questions (op system, amount of RAM, CPU etc.). Also, could you please show me how you are using kmc, I mean the command line?

naahraissa commented 2 years ago

Thanks for understanding. I reattached as a zip file

I used the command with the following options : kmc -k21 -ci1 -t40 -fm @files.txt kmc_tmp @files.txt : was I created a text file with the path to the ccs fasta file. I tried both the converted fasta and raw bam file but nothing works,

Thanks Mer segmentzzzajbjs.zip segmentzzzajbjs.gz

marekkokot commented 2 years ago

Oh I know the reason. You need to specify <input> <output> <tmp_dir>. In your command, you specify only two of them. KMC should print some nicer error message. So for example your command should be: kmc -k21 -ci1 -t40 -fm @files.txt out kmc_tmp

Remember that kmc_tmp dir must exist before running kmc or you may just run: kmc -k21 -ci1 -t40 -fm @files.txt out . to make the current working directory a temp dir.

Also the file you have prepared is not a valid fasta file, because it does not start with >. Let me know if it helps.

naahraissa commented 2 years ago

Okay Thanks , i greatly appreciate Mare, Thanks again.

I will give you a feedback, i got some illumina reads yesterday i will try with them and let you Know,

Thank you for your suggestions,

Raissa

naahraissa commented 2 years ago

Hi Mare, i really appreciate your prompt help always. you are really an instrumental soft ware developer,

I tried the software again it didn't work with fastq file, still collapsing: with your command

i am still not getting the trick,

i even downloaded the release from; Small fixes related to counting k-mers from KMC database: https://github.com/refresh-bio/KMC/releases/download/v3.2.1/KMC3.2.1.linux.tar.gz, as an alternative to unpack and run.

i attempted to unpack with the : tar -xvzf could see details of the folder only on my command line but not on the output folder. i never encountered this with other software. I am new in the bioinformatic, seems i am not getting the trick

I uploaded a trimmed fastq files, please help me figure out. if it don't worl i can sent you a raw file

Trimmomatic-0.39.tar.gz

naahraissa commented 2 years ago

Please i attacked the wrong file. Let me create tar.gz file and reattach

marekkokot commented 2 years ago

Hi,

I don't think I'm following :(. Before you were saying you are using multi-fasta files, not you say about fastq (for fastq you should use -fq switch, which is default so may be skipped, instead of -fm) . You send me a pack Trimmomatic-0.39.tar.gz which contains some java code, I'm not sure what do you want from me to do with this :)

I don't know why are you exporting or downloading kmc, as I understood before, KMC is printing usage so it works - no need to reainstall it (probably).

Also, each time you are reporting something please give me also a full command line, and it would be great if you could wrap it as described here (https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code). It is not crucial but helpful when reading.

naahraissa commented 2 years ago

i am sorry i attached the wrong file let be

naahraissa commented 1 year ago

Thanks for understanding. I reattached as a zip file

I used the command with the following options : kmc -k21 -ci1 -t40 -fm @files.txt kmc_tmp @files.txt : was I created a text file with the path to the ccs fasta file. I tried both the converted fasta and raw bam file but nothing works,

Thanks Mer

From: marekkokot @.> Sent: Tuesday, May 10, 2022 1:29 AM To: refresh-bio/KMC @.> Cc: Na-ah, Raissa Fon @.>; Author @.> Subject: Re: [refresh-bio/KMC] Couldn't run KMC after troubleshooting with issue #176 (Issue #187)

Hello,

issues with hardware do happen sometimes, so don't worry. I can't see your 10kb file. Could you also answer my other questions (op system, amount of RAM, CPU etc.). Also, could you please show me how you are using kmc, I mean the command line?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/refresh-bio/KMC/issues/187*issuecomment-1121981380__;Iw!!DZ3fjg!8cH7afvifduLuGNWaCsCYnADbfoJm6yK72tRNV3TRbC2I_mI0wWDOfy2SJxdMBxfB3903avcLcWXKYpUupQmfgUJ$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AXF6UKNCGU5RZXOQSS2KRQDVJH62FANCNFSM5VGISBMA__;!!DZ3fjg!8cH7afvifduLuGNWaCsCYnADbfoJm6yK72tRNV3TRbC2I_mI0wWDOfy2SJxdMBxfB3903avcLcWXKYpUuqVMaBKO$. You are receiving this because you authored the thread.Message ID: @.**@.>>

marekkokot commented 1 year ago

Hi, where is the reattached zip? It's been some time so it would be great if you could do again following:

  1. Give me your input data
  2. Git me exact command lines
  3. Give me info about your environment (operating system, CPU, amount of RAM)
  4. Describe what exactly is not working (what is the result and what you expect it to be)