Closed machalita closed 1 year ago
Hi!
Which version are you using, and could you share the log (stderr)? Also, I guess the sample is either not from gut/fecal material, or a pooled sample from many similar individual samples...? In which case I have a local commit that fixes a certain high peak RSS problem. Maybe I should push it.
Hi! Thanks for your prompt reply! It's one stool sample per fastq. What we are doing is building a reference database using hifi revio, that's why we are using only one stool sample per cell, so that way we have longer contigs for the reference. Your question actually made me wonder if there is host DNA on the raw data and that could be the reason? This is the version: hifiasm_meta 0.3-r073 (hifiasm code base 0.13-r308)
Here is the output of one of the runs:
[M::main] Start: Thu Oct 19 17:12:17 2023
[M::hamt_assemble] Skipped read selection. [prof::yak_count] step 1 total 694.07 s, step2 362.13 s, step3 1011.65 s. [M::ha_analyze_count] lowest: count[8] = 1424304 [M::ha_analyze_count] highest: count[30] = 97584106 [M::ha_hist_line] 2: ** 10127388 [M::ha_hist_line] 3: * 3363429 [M::ha_hist_line] 4: 2296285 [M::ha_hist_line] 5: 1850155 [M::ha_hist_line] 6: 1604104 [M::ha_hist_line] 7: 1458033 [M::ha_hist_line] 8: 1424304 [M::ha_hist_line] 9: 1525195 [M::ha_hist_line] 10: 1821511 [M::ha_hist_line] 11: 2355628 [M::ha_hist_line] 12: * 3033452 [M::ha_hist_line] 13: 4174765 [M::ha_hist_line] 14: ** 5678811 [M::ha_hist_line] 15: **** 7757332 [M::ha_hist_line] 16: * 10410188 [M::ha_hist_line] 17: ** 13935321 [M::ha_hist_line] 18: ***** 18413846 [M::ha_hist_line] 19: **** 23886233 [M::ha_hist_line] 20: * 30519807 [M::ha_hist_line] 21: ***** 37989331 [M::ha_hist_line] 22: **** 46354786 [M::ha_hist_line] 23: * 55401470 [M::ha_hist_line] 24: ** 64340262 [M::ha_hist_line] 25: *** 73237353 [M::ha_hist_line] 26: * 81300214 [M::ha_hist_line] 27: ** 88282840 [M::ha_hist_line] 28: **** 93602952 [M::ha_hist_line] 29: ***** 96804178 [M::ha_hist_line] 30: **** 97584106 [M::ha_hist_line] 31: ** 96094424 [M::ha_hist_line] 32: ***** 92682652 [M::ha_hist_line] 33: ** 87529679 [M::ha_hist_line] 34: * 80911527 [M::ha_hist_line] 35: ***** 73298126 [M::ha_hist_line] 36: * 65212438 [M::ha_hist_line] 37: ** 56773590 [M::ha_hist_line] 38: ** 48313475 [M::ha_hist_line] 39: *** 40291220 [M::ha_hist_line] 40: ** 33059098 [M::ha_hist_line] 41: * 26646298 [M::ha_hist_line] 42: **** 20966994 [M::ha_hist_line] 43: 16214006 [M::ha_hist_line] 44: *** 12480532 [M::ha_hist_line] 45: ** 9364133 [M::ha_hist_line] 46: * 7015649 [M::ha_hist_line] 47: * 5198922 [M::ha_hist_line] 48: ** 3869733 [M::ha_hist_line] 49: * 2835076 [M::ha_hist_line] 50: 2050567 [M::ha_hist_line] 51: 1553342 [M::ha_hist_line] 52: 1192912 [M::ha_hist_line] 53: 949363 [M::ha_hist_line] 54: 802113 [M::ha_hist_line] 55: 726049 [M::ha_hist_line] 56: 687219 [M::ha_hist_line] 57: 653101 [M::ha_hist_line] 58: 646124 [M::ha_hist_line] 59: 662057 [M::ha_hist_line] 60: 665511 [M::ha_hist_line] 61: 666643 [M::ha_hist_line] 62: 669793 [M::ha_hist_line] 63: 675624 [M::ha_hist_line] 64: 665216 [M::ha_hist_line] 65: 646753 [M::ha_hist_line] 66: 628077 [M::ha_hist_line] 67: 604381 [M::ha_hist_line] 68: 584115 [M::ha_hist_line] 69: 554289 [M::ha_hist_line] 70: 528338 [M::ha_hist_line] 71: 494115 [M::ha_hist_line] rest: ** 17581454 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: none [M::hamt_ft_gen] peak_hom: 30; peak_het: -1 [M::hamt_ft_gen::1029.413*24.57@62.390GB] ==> filtered out 974339 k-mers occurring 750 or more times [M::hamt_assemble] Generated flt tab.
[M::hamt_assemble] entered read correction round 1 [M::ha_pt_gen] counting - minimzers [prof::yak_count] step 1 total 791.70 s, step2 242.73 s, step3 512.32 s. [M::ha_pt_gen::1831.994*17.50] ==> counted 90734509 distinct minimizer k-mers [M::ha_pt_gen] count[16383] = 1361 (for sanity check) [M::ha_analyze_count] lowest: count[8] = 67172 [M::ha_analyze_count] highest: count[30] = 3787417 [M::ha_hist_line] 1: ****> 24456356 [M::ha_hist_line] 2: **** 608786 [M::ha_hist_line] 3: ** 178787 [M::ha_hist_line] 4: 116426 [M::ha_hist_line] 5: 92328 [M::ha_hist_line] 6: 79180 [M::ha_hist_line] 7: 70836 [M::ha_hist_line] 8: 67172 [M::ha_hist_line] 9: 69313 [M::ha_hist_line] 10: 80361 [M::ha_hist_line] 11: 100290 [M::ha_hist_line] 12: 126122 [M::ha_hist_line] 13: 168677 [M::ha_hist_line] 14: ** 227439 [M::ha_hist_line] 15: **** 307031 [M::ha_hist_line] 16: * 410916 [M::ha_hist_line] 17: ** 546977 [M::ha_hist_line] 18: ***** 721766 [M::ha_hist_line] 19: * 933200 [M::ha_hist_line] 20: *** 1187932 [M::ha_hist_line] 21: * 1474745 [M::ha_hist_line] 22: ***** 1798821 [M::ha_hist_line] 23: * 2151068 [M::ha_hist_line] 24: ** 2495899 [M::ha_hist_line] 25: *** 2836556 [M::ha_hist_line] 26: * 3149774 [M::ha_hist_line] 27: ** 3423664 [M::ha_hist_line] 28: **** 3630426 [M::ha_hist_line] 29: ***** 3754429 [M::ha_hist_line] 30: **** 3787417 [M::ha_hist_line] 31: **** 3733400 [M::ha_hist_line] 32: 3605152 [M::ha_hist_line] 33: ** 3403755 [M::ha_hist_line] 34: ***** 3145668 [M::ha_hist_line] 35: * 2850707 [M::ha_hist_line] 36: ***** 2542444 [M::ha_hist_line] 37: ** 2209581 [M::ha_hist_line] 38: ** 1884197 [M::ha_hist_line] 39: ** 1574312 [M::ha_hist_line] 40: ** 1293341 [M::ha_hist_line] 41: * 1039257 [M::ha_hist_line] 42: ** 820301 [M::ha_hist_line] 43: **** 634294 [M::ha_hist_line] 44: 488605 [M::ha_hist_line] 45: ** 367413 [M::ha_hist_line] 46: ** 275713 [M::ha_hist_line] 47: 203191 [M::ha_hist_line] 48: * 151040 [M::ha_hist_line] 49: 111831 [M::ha_hist_line] 50: 80421 [M::ha_hist_line] 51: 60981 [M::ha_hist_line] 52: 46922 [M::ha_hist_line] 53: 37343 [M::ha_hist_line] 54: 31106 [M::ha_hist_line] 55: 27904 [M::ha_hist_line] 56: 26260 [M::ha_hist_line] 57: 25190 [M::ha_hist_line] 58: 24665 [M::ha_hist_line] 59: 24742 [M::ha_hist_line] 60: 25325 [M::ha_hist_line] 61: 25320 [M::ha_hist_line] 62: 25679 [M::ha_hist_line] 63: 25817 [M::ha_hist_line] 64: 25455 [M::ha_hist_line] 65: 24870 [M::ha_hist_line] 66: 24162 [M::ha_hist_line] 67: 23185 [M::ha_hist_line] 68: 22432 [M::ha_hist_line] 69: 21430 [M::ha_hist_line] 70: 20177 [M::ha_hist_line] 71: 19286 [M::ha_hist_line] rest: ** 678941 [M::ha_analyze_count] left: none [M::ha_analyze_count] right: none [M::ha_pt_gen] peak_hom: 30; peak_het: -1 [M::ha_pt_gen] counting - minimzer positions [prof::yak_count] step 1 total 48.03 s, step2 356.28 s, step3 650.25 s. [debug::ha_pt_gen] tot_cnt is 2139458679, pt->tot_pos is 2139458679 [M::ha_pt_gen::2488.172*15.72] ==> indexed 2139458679 positions Killed
Thank you for the log. The k-mer histogram is a bit strange. I think the bell shape peaking at 30x is unusual for a metagenome sample. Could you check the input file to make sure it's not an eukaryotic library by accident?
I also pushed a f98f1ad to meta_dev
branch, you could try it and see if it helps. Please post log if it's still killed or crashed, thanks.
Thank you so much for your help! The samples that I was trying to assemble did contain a high percent of host reads, so after filtering them out I was able to successfully assemble it. An apology for my rookie mistake, as it has never happened to me with stool samples, but then I found out these were "bloody" stools, so they did contain plenty of host DNA =p Thank you!
Glad it worked out :D and thank you for the testing. Closing, please feel free to reopen/post new if encountering any problem.
Greetings! This is not a bug but rather an optimization issue. We are currently using REVIO data using one sample per cell. This leaves us with a raw file of 30-40 gigabytes to do metagenome assembly. We are currently using a server with 750gb of RAM memory and it seems not to be enough. Is there a parameter that can be tweaked to reduce memory usage? Is there an approximate formula we can use to calculate how much memory is needed?
Thank u very much!