jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
380 stars 80 forks source link

The process freezes at step 10 #68

Closed RitaBogorad closed 4 years ago

RitaBogorad commented 4 years ago

Hello,

I have been running squeezemeta for the second time with a slight change in original metatranscriptome fastq samples. I have used coassembly in both cases and run on our local server. However, this time the process is constantly freezing on step 10 and always on Sample#2 (4 PE samples overall), giving this output. The process disappears completely from the server but not giving any error or displaying any change for a long time. The manual interruption causes the process to "wake up" and appear again with the next step ( Aligning to reference with bowtie of Sample2) but being "core dumped" few seconds after and completely exiting. I have tried to repeat the steps using the restart script but did not help, the process freezes again. As the previous run with almost the same samples successfully managed to pass step 10, I am not sure what I could have done wrong. The server seems to have plenty of memory available.

Would be very grateful to hear your advice,

Thank you

[3 days, 1 hours, 40 minutes, 52 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl Reading mapping file from /home/mbogorad/2018/metatrans/unzipped/trimmomatic_analysed/paired/2018metatrans/data/00.2018metatrans.samples Metagenomes found: 8 Creating reference. Working with 1: Sample1 Getting raw reads Aligning to reference with bowtie Calculating contig coverage rm: cannot remove '/home/mbogorad/2018/metatrans/unzipped/trimmomatic_analysed/paired/2018metatrans/temp/2018metatrans.Sample1.current_2': No such file or directory Counting with sqm_counter 37000000 reads counted Working with 2: Sample2 Getting raw reads

fpusan commented 4 years ago

Hi! It seems that you only have a "pair2" specified for Sample2 in the samples file, without having specified the "pair1" In the previous line. Is that the case? If not, could you send us your samples file? Best, Fernando

RitaBogorad commented 4 years ago

Thank you very much for your reply,

Indeed I have noticed a mistake in my .samples file, currently ruining the program with the corrected version.

Could that be possible to update only the samples file on a particular step without initiating a completely new run? Are the pair names involved in previous steps making the whole analysis not valid?

Thank you once again,

Rita

jtamames commented 4 years ago

Hello! I am sorry for that... which version are you using? Because latest versions of SQM include some extra checks of the samples file to prevent mistakes. I assume that you are doing a coassembly or merge, since you have 8 metagenomes. I do not know which was exactly the problem in your samples file, but it seems that some of the reads files were not properly specified, isn´t it? In that case the corresponding metagenome will not be represented in the assembly, and then you will have no data for it. Sorry to say, but I think it is better you start from scratch. In case you are using merged mode we could try to spare some of the assemblies that were already done. Is that the case?

Best, Javier

RitaBogorad commented 4 years ago

Hello, Javier,

I am using SqueezeMeta v1.0.0. The issue was 100% my fault as I have indexed the sample names incorrectly prior to the second run. I have 4 PE samples and the wrong labeling caused it to look like I have 8. The new run with the correct file successfully passed step 10 :) thanks again for pointing on this silly mistake!

If I already have your attention, can I also ask a question about metabat2? Step 15 produces 0 bins, I have checked the forum of metabat2 for similar issues and tried different approaches (for example playing with metabat parameters) to solve the problem but nothing seemed to help. From your point of view, could that be the data problem? Outputs from the previous steps look ok, I have tried running metabat2 independently from the squeezemeta script and attaching to the command sorted BAM files of each sample but nothing produced more than 0 bins. Have you encountered such an issue in the past?

jtamames commented 4 years ago

Dear Rita Happy to hear you solved the problem. Not a silly mistake, just a plain mistake :)

Did you try running Metabat2 by itself (not within SqueezeMeta)? You could do that to know if this is an issue with the SqueezeMeta installation or on the contrary, Metabat2 does not work well with your samples. Usually we get much less bins with Metabat2 than with Maxbin, and yes, for some small projects I have seen zero bins in Metabat2 results, but it is not an usual thing. Best, Javier

RitaBogorad commented 4 years ago

I did try to run metabat through anaconda and the result was the same, plus my colleague was running squeezemeta on his files and he did manage to get bins with metabat so I was quite sure it was my data and not the installation problem. As --nometabat option exists in squeezemeta, I would assume that the DAS tool (and all the next steps) would avoid produced by metabat output but the error still appears. Is there a way to proceed automatically only with maxbin bins or I will need to call the next commands manually, pointing only on the maxbin output? Thanks again for all the help!

jtamames commented 4 years ago

Hello Did you try the --nometabat option? And maxbin returned any bins? If I understand correctly, SqueezeMeta is failing in step 16 (DasTool)? Best,

RitaBogorad commented 4 years ago

After I have stopped trying to make metabat work, I thought about running DAS tool only on maxbin output as maxbin managed to finish and produced bins so I used this command as a next step

16.dastool.pl 2018 --nometabat

the command line output was:

Running DAS Tool using 12 threads. predicting genes using Prodigal V2.6.3: February, 2016 identifying single copy genes using diamond version 0.9.22 ERROR: Cannot read scaffold2bin file: /home/mbogorad/2018/results/DAS/metabat2.table. Please check file/format. Should be: scaffold_name \t bin_id Execution halted

Although some files were generated into the /results/DAS folder, it is still looking for metabat files, so I guess it is impossible to use squeezemeta arguments while running single scripts?

jtamames commented 4 years ago

Yeah indeed, you cannot specify arguments when running individual scripts. Arguments are set at the start of the run by SqueezeMeta.pl But you can do the following: Edit the SqueezeMeta_conf.pl file in your project directory (/home/mbogorad/2018), and add the following line at the end:

$nometabat=1

(This is what SqueezeMeta.pl would have done to skip metabat) Then, restart using restart.pl -step 16. It should work. Please tell me if it does.

Best,

jtamames commented 4 years ago

And if it doesn´t, also tell me lol

RitaBogorad commented 4 years ago

adding $nometabat=1 did not help but editing this line in the SqueezeMeta_conf.pl did the work original: %bindirs=("maxbin","$resultpath/maxbin","metabat2","$resultpath/metabat2"); #-- Directories for bins edited: %bindirs=("maxbin","$resultpath/maxbin"); #-- Directories for bins

I have removed the directory of metabat and restarted, DAS tool successfully avoided the output from metabat and finished!

Very grateful for your help,

Rita