Open robertwhbaldwin opened 3 years ago
Hi,
The coverage (--c 18) of your input is lower than Kmer2SNP needs. Kmer2SNP needs at least 30x coverage data. "INFO:root:heterozygous kmer coverage range 1 14", Kmer2SNP uses all kmers which coverage range from 1 to 14 in you experiment. Kmer at low coverage, like 1,2, more like a error k-mer.
Thanks for the response Anu,
I understand that coverage is low, but I'd still like to see the results. Does the low coverage have something to do with why the run is maxing out of memory? If I were to get a higher coverage sample, for example, what should I expect in order to do the analysis? Thanks - Robert
Could you please try "python3 kmer2snp.py single --k 31 --c1 5 --c2 14 --r 0.025 --fastaq G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq" ?
When using your command what should I use for --c? Because it says that --c is required. If --c is homozygous coverage shouldn't it be > 14 ?
python3 kmer2snp.py single --k 31 --c1 5 --c2 14 --r 0.025 --fastaq G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq
usage: kmer2snp.py single [-h] --k K --c C --fastaq FASTAQ [--output_dir OUTPUT_DIR] [--b B] [--t1 T1] [--c1 C1] [--c2 C2] [--r R] kmer2snp.py single: error: the following arguments are required: --c
You can still use --c 18.
I ran it with several different values of --c (7 ..20) and every time the process used up all my memory and got killed. Is this normal? Suppose I do have some 30 X samples, will I need more memory to run them or is this problem related to the low coverage of the samples being used here? Thanks - Robert
I should also point out that I had to change all the instances of time.clock() with time.time() because time.clock() was not supported.
I think you need more memory to run them. I tested on 12G (~300x) input, and genomesize~=40M, heteroyzgous rate ~=0.001, ~32G memory was used. Your dataset has higher heteroyzgous rate and genome size and lower coverage. You need more memory even for 30x sample.
Hi
I ran this command:
(base) robert@robert-ThinkStation-P340:~/tools/Kmer2SNP$ python3 kmer2snp.py single --k 31 --c 18 --fastaq G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq
and it working fine but the memory just kept creeping up until it reached 62 G and the process got killed. Is this RAM usage normal? Each fastq file is ~15 G. Do I need a different machine or is there some kind of problem? Thanks.
Here's the otuput:
Namespace(k='31', c='18', fastaq='G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq', output_dir=None, b=None, t1=None, c1=None, c2=None, r=None, which='single') /home/robert/tools/Kmer2SNP sh /home/robert/tools/Kmer2SNP/script/run_dsk.sh 31 18 G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq [DSK: nb solid kmers: 763163783 ] 102 % elapsed: 2 min 35 sec remaining: 0 min 0 sec cpu: 765.9 % mem: [1297, 5458, 5458] MB config
kmer_size : 31 mini_size : 10 solidity_kind : sum abundance_min : 2 abundance_max : 2147483647 available_space : 340822 estimated_sequence_number : 84779362 estimated_sequence_volume : 11545 estimated_kmers_number : 9495288544 estimated_kmers_volume : 72443 max_disk_space : 338822 max_memory : 5000 nb_passes : 1 nb_partitions : 90 nb_bits_per_kmer : 64 nb_cores : 10 minimizer_type : lexicographic (kmc2 heuristic) repartition_type : unordered nb_cores_per_partition : 1 nb_partitions_in_parallel : 10 nb_cached_items_per_core_per_part : 131072 nb_banks : 2 dsk
bank
bank_uri : G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq bank_size : 30669060634 bank_total_nt : 12267090154 sequences
seq_number : 85840826 seq_size_min : 35 seq_size_max : 151 seq_size_mean : 142.9 seq_size_deviation : 21.6 kmers
kmers_nb_valid : 9689452950 kmers_nb_invalid : 2412424 stats
temp_files
nb_superkmers : 892050891 avg_superk_length : 10.86 minimizer_density : 2.12 totalsize(MB) : 9771 tmp_filebiggest(MB) : 262 tmp_filesmallest(MB) : 97 tmp_filemean(MB) : 108.6 histogram
cutoff : 3 nb_ge_cutoff : 723211762 ratio_weak_volume : 0.05 first_peak : 8 kmers
solidity_kind : sum thresholds : 2 2 kmers_nb_distinct : 1113659627 kmers_nb_solid : 763163783 kmers_nb_weak : 350495844 kmers_percent_weak : 31.5 partitions
nb_partitions : 90 nb_items : 763163783 part_biggest : 15410056 part_smallest : 5656820 part_mean : 8479597.6 kind
vector : 90 fillsolid_time : 46.655 1.read : 11.294 2.sort : 21.455 3.dump : 13.906 time : 154.558 fill_partitions : 95.007 fill_solid_kmers : 59.551 [parsing ] 100 % elapsed: 1 min 38 sec remaining: 0 min 0 sec stats
kmer_size : 31 nb_kmers : 763163783 sh /home/robert/tools/Kmer2SNP/script/run_findgse.sh 31 18 G1020-17-RHF-0975_S383_L004_R1_001.fastq,G1020-17-RHF-0975_S383_L004_R2_001.fastq Warning message: In dir.create(file.path(path), showWarnings = T) : '.' already exists Estimated heteroyzgous rate is 0.02532581 heterozygous kmer coverage range 1 14 INFO:root:heterozygous kmer coverage range 1 14 INFO:root:picked heterozygous Kmer number: 608795343 INFO:root:finish reading (kmer cov) file, cost 2773.74 seconds Killed
And here's screenshot of the folder ti wrote to:
THANKS !!