jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
349 stars 81 forks source link

Error on step 14 #26

Closed SamueWildhaber closed 5 years ago

SamueWildhaber commented 5 years ago

Hello, At step 14 following error message occured: Error running command: perl /media/smrtanalysis/Linux_Data/SqueezeMeta/scripts/../bin/MaxBin/run_MaxBin.pl -thread 11 -contig /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/temp/bincontigs.fasta -abund_list /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/results/maxbin/abund.list -out /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/results/maxbin/maxbin -markerpath /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/db/marker.hmm at /media/smrtanalysis/Linux_Data/SqueezeMeta/scripts/../scripts/14.bin_maxbin.pl line 103. I looked in the code and it stands: my $command="perl $maxbin_soft -thread $numthreads -contig $tempfasta -abund_list $abundlist -out $dirbin/maxbin -markerpath $databasepath/marker.hmm"; print "Now running Maxbin: $command\n"; my $ecode = system $command; if($ecode!=0) { die "Error running command: $command"; } Can you help me out with this error?

fpusan commented 5 years ago

Hi! What is your OS? Did SqueezeMeta work with the test data? What error message do you get if you run directly the following command?

perl /media/smrtanalysis/Linux_Data/SqueezeMeta/scripts/../bin/MaxBin/run_MaxBin.pl -thread 11 -contig /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/temp/bincontigs.fasta -abund_list /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/results/maxbin/abund.list -out /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/results/maxbin/maxbin -markerpath /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/db/marker.hmm

SamueWildhaber commented 5 years ago

Hi, We work with Ubuntu and yes it worked with the test data. It says: Try harder to dig out marker genes from contigs. Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop. smrtanalysis@EGSB-WS04:~$ I have to add that we work with Amplicon reads generated with MinION

fpusan commented 5 years ago

So it would seem that SqueezeMeta is working (at least the test run) but MaxBin fails with your data. It's my first time seeing that error code from MaxBin, but it would seem that there was some problem with the assembly. Did you use the --minion flag when calling SqueezeMeta. What are the assembly stats? you can find them in /media/smrtanalysis/Linux_Data/SqueezeMeta/databases/test/ampli_seq01/intermediate/01.ampli_seq01.stats

Also just for double-checking (since I see that your project is named "ampli_seq") these are metagenomes, not amplicons, right?

SamueWildhaber commented 5 years ago

We used the following command for starting SqueezeMeta: smrtanalysis@EGSB-WS04:/$ /media/smrtanalysis/Linux_Data/SqueezeMeta/scripts/SqueezeMeta.pl -t 11 -m coassembly -p ampli_seq02 -s run0004.samples -f BC02 -map minimap2-ont. The assembly stats look as following: stats_assembly N50 490 stats_assembly N75 397 stats_assembly N90 333 stats_assembly N95 311 stats_info bases 271327 stats_info reads 595 stats_len max 1783 stats_len mean 456.01 stats_len median 463 stats_len min 229 stats_len mode 505 stats_len modeval 9 stats_len range 1555 stats_len stddev 127.69

And yes like I said we are working with amplicons and NOT metagenomes. We wanted to try SqueezeMeta with amplicons and apart from the binning it seems to work

fpusan commented 5 years ago

Ah yes, I didn't notice that you had already mentioned that you were using amplicons. Sorry about that. That's a rather unorthodox use. Most of the annotation steps performed by SqueezeMeta are based on similarity to protein databases, and will not work if you're using ribosomal amplicons.

Anyways, you can launch SqueezeMeta with the --nobins parameter to skip binning. I'd also recommend to use the --minion flag rather than -map minimap2-ont. That way you'll still use minimap2 for mapping, but also use canu for assembly, which works better for minion reads.

If your amplicons are from protein-coding regions, you may be interesting in skipping assembly and analyzing reads individually. Search for "sqm_reads.pl" in the pdf manual for a thorough explanation of how this works.

SamueWildhaber commented 5 years ago

Yes we were warned about that since it was mentioned that SqueezeMeta is particularly developed for metagenomics, however we thought it would work for amplicon reads as well.

We looked for a program or a script which is capable of binning amplicon reads and unfortunately it seems like Squeezemeta is not capable of doing so but we will try it with the recommended parameters and we will give some feedback.

Yes our amplicons are from protein-coding regions but we are especially interessted in assembling them so I think your suggestion may not work for us.

Thanks a lot

fpusan commented 5 years ago

Closing this. Let me know if you need further assistance

SamueWildhaber commented 5 years ago

Hello,

I would like to give an update to this issue respectively problems we faced with SqueezeMeta as an amplicon pipeline. After consindering your recommendations we used the following parameters:

After the adjustment of the parameters SqueezeMeta run without generating an error message. We were not able to run the binning process which is a shame and why we used ultimately nobins. So at the end we run SqueezeMeta without steps 13-19. However we noticed that parts of the contigs are much smaller than expected. Our sequencing report showed that around 10% of the reads were smaller than 400 bp but in the contig table generated through SqueezeMeta 50% of the reads were smaller than 400 bp, so something must have went wrong during the read assembly. You explained earlier that it is rather unorthodox to use SqueezeMeta for amplicons but it seemed to work out better than expected and with some adjustments to the code or parameters, SqueezeMeta could be also used for amplicons.

So do you have any clue, why the assembly-process somehow splits the reads? Do you have any recommendations to adjustments we could do in order to solve the problems?

Kind regards

Samuel

fpusan commented 5 years ago

Hi! From your previous message it seems that you didn't use the --minion flag as I recommended. That way, you are using Megahit instead of canu for assembly. Canu is the assembler better suited for MinION reads.

So either add

-t 11 -m coassembly -map minimap2-ont -a canu --nobins

or

-t 11 -m coassembly --minion --nobins

Also, the contig table is no longer reporting the lenght distribution of the reads, but the length distribution of the contigs. It might be the case that Megahit was failing to assemble the longer reads, thus resulting in smaller lenghts in the contigs than in the original reads. This is rather unusual, but since you are using amplicons you don't expect to have long contigs anyways.

What length are your target amplicons? (from the forward primer to the reverse primer?)

FInally, what was the result that you expected to obtain by binning amplicons? Binning is clasically used to group together contigs that may belong to the same genome, but I'm not sure classical binning algorithms were designed with amplicons in mind. Wouldn't something like typical amplicon pipeline (e.g. mothur or DADA2) followed by correlation network analysis (e.g. with SparCC) would better suited to detect groups of co-varying amplicons?

SamueWildhaber commented 5 years ago

Hi,

Thanks for your fast response. I forgot to add that we already tested with the canu option but it did not work out as we expected. We noticed that with some changes in parameters canu will work with amplicons. These would be: -genomeSize 1000 -nanopore-raw\*fastq -minReadLength 250 -minOverlapLength 250 I saw that there is a tag for assemlby_option within SqueezeMeta to select specific parameters for canu, however i couldn't work out how to integrate them within the command line for SquuezeMeta so I'd really appreciate if you could give me an example for a command line with the above mentioned parameters. Our amplicons are approximately around 510 bp but we noticed that there are also some smaller ones around (between 400-500).

We thought binning would be a good solution to group the reads, which are similar to each other so that we could allign them afterwards. But at the moment we would be just happy with corrected reads from canu.

We did not consider either mothur or DADA.

I'm quite new to all of this so I have limited knowledge about programs or softwares, which could help us so sorry about that...

fpusan commented 5 years ago

Binning does not group reads with similar sequences (that would be clustering), but with similar abundance profiles.

For customizing assembly options in SqueezeMeta just add

-assembly_options "-genomeSize 1000 -minReadLength 250 -minOverlapLength 250"

The extra assembly options must be enclosed in double quotes.

In general, I think you would be better by using an amplicon-oriented pipeline (though I'm of course glad that you gave SqueezeMeta a chance!). I'm not sure though on what would be the best pipeline for minION reads, which generally have a lot of sequencing errors...

SamueWildhaber commented 5 years ago

We adjusted the parameters in the original code and the output generated through SqueezeMeta was now much better. The problem we have at the moment is the amount of reads where canu seems to have problems but MegaHit not.

Thanks for your help it was very interessting to see whether SqueezeMeta is capable to work with amplicon reads and at the end we can say with some adjustments it would work.