voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

Output #300

Open 18AS opened 3 years ago

18AS commented 3 years ago

I have MEGAHIT v1.2.9 installed. I tried to use the different settings like: --k-min,--k-max,--k-step, --min-contig-len 1000, --presets meta-sensitive I am getting the same output every time and it is running at the same k-mers irrespective of definite changes made. Please help me understand how to not to run into defaults.

edgardomortiz commented 3 years ago

The use of --presets overrides your other parameters.

From the program's help:

  Presets parameters:
    --presets                <str>          override a group of parameters; possible values:
                                            meta-sensitive: '--min-count 1 --k-list 21,29,39,49,...,129,141'
                                            meta-large: '--k-min 27 --k-max 127 --k-step 10'
                                            (large & complex metagenomes, like soil)

Edgardo

18AS commented 3 years ago

I was using megahit with the --presets meta-sensitive parameter. However, megahit was still running and considering 21,29,39,59,79,99, and 119 only and not the one mentioned above. Not sure why it is not giving output as mentioned by you or given in http://www.metagenomics.wiki/tools/assembly/megahit.

Data: I am using illumina paired end reads with (parameters -1 and -2). I added the unpaired reads generated after quality checking with trimmomatic (-r) as well.

edgardomortiz commented 3 years ago

In general, it is more useful when you share the actual command you used. If you do, perhaps somebody can spot the mistake in the command.

18AS commented 3 years ago

I am really sorry for not being specific. The command that I used is as follows: megahit -1 bs_F_paired.fastq.gz -2 bs_R_paired.fastq.gz –r bs_F_unpaired.fastq.gz , bs_R_unpaired.fastq.gz --presets meta-sensitive -o megahit_result_bs

edgardomortiz commented 3 years ago

So far I would say the spaces between the set of unpaired reads could be breaking down your command, try:

megahit -1 bs_F_paired.fastq.gz -2 bs_R_paired.fastq.gz –r bs_F_unpaired.fastq.gz,bs_R_unpaired.fastq.gz --presets meta-sensitive -o megahit_result_bs
18AS commented 3 years ago

I tried to run after removing the spaces. It is still taking up the set of k-mer mentioned above. What else should I try?

edgardomortiz commented 3 years ago

Could you paste the output from the terminal Including your command? It is strange I ran short ago a --meta-sensitive analysis and it used a kstep of 10, this was in Linux.

18AS commented 3 years ago

Hi! I have attached an image of the terminal while running the command. megahit -1 S1564Nr4.1_host_removed.paired.fastq.gz -2 S1564Nr4.2_host_removed.paired.fastq.gz –r S1564Nr4.1_host_removed.unpaired.fastq.gz,S1564Nr4.2_host_removed.unpaired.fastq.gz --presets meta-sensitive -o megahit_output Megahit_presets

edgardomortiz commented 3 years ago

This is another test I did, now in a Mac:

$ megahit -1 hs0402_R1.fq.gz -2 hs0402_R2.fq.gz --presets meta-sensitive -o test
2021-03-13 08:27:24 - MEGAHIT v1.2.9
2021-03-13 08:27:24 - Using megahit_core with POPCNT and BMI2 support
2021-03-13 08:27:24 - Convert reads to binary library
2021-03-13 08:27:25 - b'INFO  sequence/io/sequence_lib.cpp  :   77 - Lib 0 (/Users/emortiz/atoll/hs0402_R1.fq.gz,/Users/emortiz/atoll/hs0402_R2.fq.gz): pe, 669674 reads, 150 max length'
2021-03-13 08:27:25 - b'INFO  utils/utils.h                 :  152 - Real: 1.1151\tuser: 0.5161\tsys: 0.1426\tmaxrss: 57286656'
2021-03-13 08:27:25 - k-max reset to: 141
2021-03-13 08:27:25 - Start assembly. Number of CPU threads 8
2021-03-13 08:27:25 - k list: 21,29,39,49,59,69,79,89,99,109,119,129,141
2021-03-13 08:27:25 - Memory used: 15461882265
2021-03-13 08:27:25 - Extracting solid (k+1)-mers and building sdbg for k = 21
2021-03-13 08:27:32 - Assemble contigs from SdBG for k = 21
2021-03-13 08:27:40 - Local assembly for k = 21
2021-03-13 08:27:44 - Extract iterative edges from k = 21 to 29
2021-03-13 08:27:45 - Build graph for k = 29
2021-03-13 08:27:46 - Assemble contigs from SdBG for k = 29
2021-03-13 08:27:49 - Local assembly for k = 29
2021-03-13 08:27:51 - Extract iterative edges from k = 29 to 39
2021-03-13 08:27:52 - Build graph for k = 39
2021-03-13 08:27:53 - Assemble contigs from SdBG for k = 39
2021-03-13 08:27:54 - Local assembly for k = 39
2021-03-13 08:27:57 - Extract iterative edges from k = 39 to 49
2021-03-13 08:27:58 - Build graph for k = 49
2021-03-13 08:27:58 - Assemble contigs from SdBG for k = 49
2021-03-13 08:27:59 - Local assembly for k = 49
2021-03-13 08:28:02 - Extract iterative edges from k = 49 to 59
2021-03-13 08:28:03 - Build graph for k = 59
2021-03-13 08:28:03 - Assemble contigs from SdBG for k = 59
2021-03-13 08:28:03 - Local assembly for k = 59
2021-03-13 08:28:06 - Extract iterative edges from k = 59 to 69
2021-03-13 08:28:07 - Build graph for k = 69
2021-03-13 08:28:07 - Assemble contigs from SdBG for k = 69
2021-03-13 08:28:07 - Local assembly for k = 69
2021-03-13 08:28:10 - Extract iterative edges from k = 69 to 79
2021-03-13 08:28:10 - Build graph for k = 79
2021-03-13 08:28:11 - Assemble contigs from SdBG for k = 79
2021-03-13 08:28:11 - Local assembly for k = 79
2021-03-13 08:28:13 - Extract iterative edges from k = 79 to 89
2021-03-13 08:28:14 - Build graph for k = 89
2021-03-13 08:28:14 - Assemble contigs from SdBG for k = 89
2021-03-13 08:28:14 - Local assembly for k = 89
2021-03-13 08:28:17 - Extract iterative edges from k = 89 to 99
2021-03-13 08:28:17 - Build graph for k = 99
2021-03-13 08:28:17 - Assemble contigs from SdBG for k = 99
2021-03-13 08:28:18 - Local assembly for k = 99
2021-03-13 08:28:20 - Extract iterative edges from k = 99 to 109
2021-03-13 08:28:20 - Build graph for k = 109
2021-03-13 08:28:21 - Assemble contigs from SdBG for k = 109
2021-03-13 08:28:21 - Local assembly for k = 109
2021-03-13 08:28:23 - Extract iterative edges from k = 109 to 119
2021-03-13 08:28:24 - Build graph for k = 119
2021-03-13 08:28:24 - Assemble contigs from SdBG for k = 119
2021-03-13 08:28:24 - Local assembly for k = 119
2021-03-13 08:28:26 - Extract iterative edges from k = 119 to 129
2021-03-13 08:28:26 - Build graph for k = 129
2021-03-13 08:28:27 - Assemble contigs from SdBG for k = 129
2021-03-13 08:28:27 - Local assembly for k = 129
2021-03-13 08:28:29 - Extract iterative edges from k = 129 to 141
2021-03-13 08:28:29 - Build graph for k = 141
2021-03-13 08:28:29 - Assemble contigs from SdBG for k = 141
2021-03-13 08:28:29 - Merging to output final contigs
2021-03-13 08:28:29 - 66 contigs, total 172556 bp, min 218 bp, max 113445 bp, avg 2614 bp, N50 113445 bp
2021-03-13 08:28:29 - ALL DONE. Time elapsed: 65.510785 seconds

The only difference to your command is that I didn't use unpaired reads, maybe that? otherwise I don't know what can be going on. If your command works without the unpaired reads I would report it as a bug in a separate issue so MEGAHIT's developers can fix it.

18AS commented 3 years ago

Thanks! indeed that was the case. Removing the unpaired reads helped. As suggested by you I will report another issue in that case.

18AS commented 2 years ago

Can someone please tell me where can I find exactly what the parameters mentioned in the header corresponding to each sequence exactly mean? For e.g. multi, len, flag, etc. I have attached an image. Capture