jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
379 stars 80 forks source link

Problem reading SQMtbles in R #403

Closed Ptero64 closed 2 years ago

Ptero64 commented 2 years ago

Hello,

I am running Sequeezemeta on metagenomics data generated by ONT sequencing (RNA microbiote, de novo sequencing). I have run sequeezemeta in sequential mode on two samples with option -a flye -map minimap2-ont --nobins Squeezemeta.pl finished without problems and I then run sqm2table.py. it generated tables (46) but shown this message:

COG0468 (RecA/RadA) was not present in your data. This is weird, as RecA should be universal, so you probably just skipped COG annotation. Skipping copy number calculation...

After copy results/tables intermediates and Squeezemeta_conf.pl I tried to read the data in R with SQMtools (v.0.6.2) but get this error

Generating tabular outputs for project in C:/Analyses_SqueezeMeta/MGN-3-B02
Error in loadSQM("C:/Analyses_SqueezeMeta/MGN-3-B02", engine = "data.table") : 
  An error occurred while running sqm2tables.py
In addition: Warning message:
In sprintf("%s/results/tables/%s.superkingdom.%s.abund.tsv", project_path,  :
  one argument not used by format '%s/results/tables/%s.superkingdom.%s.abund.tsv

I did not have it in a previous analyses on different samples.

Do you know what could be the problem?

Also I tried first to run the two samples in coassembly mode but the pipeline failed at the step 14.bin_maxbin.pl

Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop.

Is it a problem related to type of my data?

Thnaks you in advance for the help and advices,

Best regards

Nicolas

fpusan commented 2 years ago

Hi! Some thoughts...

COG0468 (RecA/RadA) was not present in your data. This is weird, as RecA should be universal, so you probably just skipped COG annotation. Skipping copy number calculation...

This is weird indeed. RecA should be there if this is a microbiome. Maybe rRNA is masking everything?

Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop. Is it a problem related to type of my data?

I would say it's related to the first problem. If RecA is not present then possibly the other marker genes used by MaxBin are not present either (or have low abundances).

Generating tabular outputs for project in C:/Analyses_SqueezeMeta/MGN-3-B02 Error in loadSQM("C:/Analyses_SqueezeMeta/MGN-3-B02", engine = "data.table") :

Regardless of the other two problems, it shouldn't be trying to re-create the tables if you copied them to your Windows machine. What is the content of C:/Analyses_SqueezeMeta/MGN-3-B02?

Ptero64 commented 2 years ago

Hi, Thanks for answers.

Samples have treatments of DNAse then rRNA depletion (followed by RT and WTA before librairy sequencing). After these two samples are from ants...?

Ok sorry, there was a mistake in the structure of the folders. I was able to import the data

> B02 = loadSQM("C:/Analyses_SqueezeMeta/MGN-3-B02", engine = 'data.table')
Loading orfs
    table...
    abundances...
    sequences
    taxonomy...
Loading contigs
    table...
    abundances...
    sequences...
    taxonomy...
Loading taxonomies
Loading functions
Loading total reads
Warning messages:
1: In sprintf("%s/results/tables/%s.superkingdom.%s.abund.tsv", project_path,  :
  one argument not used by format '%s/results/tables/%s.superkingdom.%s.abund.tsv'
2: In sprintf("%s/results/tables/%s.superkingdom.%s.abund.tsv", project_path,  :
  one argument not used by format '%s/results/tables/%s.superkingdom.%s.abund.tsv'
3: In loadSQM("C:/Analyses_SqueezeMeta/MGN-3-B02", engine = "data.table") :
      There are no copy number tables in your project, possibly because COG annotation was not performed or RecA was not present in the metagenome

The errors message are related to the RecA steps that haven't worked?

I had an other question, I saw in an other post that you don't recommend to do binning for metatranscriptomic analyses. (My samples are human nasal swab samples extracted using viral rna dna kits). The best is to run the pipeline in sequential mode?

fpusan commented 2 years ago

Samples have treatments of DNAse then rRNA depletion (followed by RT and WTA before librairy sequencing). After these two samples are from ants...?

If the samples are from ants and you are trying to get their (gut?) microbiome, then maybe you have a lot of host contamination. That might explain the fact that you are not getting any RecA.

(My samples are human nasal swab samples extracted using viral rna dna kits)

However if the nucleic acid you are sequencing is expected to come from viruses, then not finding RecA would be perfectly normal.

The errors message are related to the RecA steps that haven't worked?

Yes, but those are only warnings so you should be able to work with the B02 project inside SQMtools.

We do not recommend binning for RNA data (you can use the --nobins flag or just ignore the binning results). Ideally you would like to make a separate assembly using DNA/metagenomics data. If you don't have it, maybe you can try using the -a rnaspades mode to try to assemble the RNA directly using RNAspades. I would try to do a coassembly, I think.

fpusan commented 2 years ago

Closing due to lack of activity, feel free to reopen

EorgeKit commented 2 years ago

HI there, regarding the problem with failure in STEP 14, I am also getting a similar problem in my analysis and I am not doing a metatranscriptomic analysis. The error I get is as follows:

============
GENE TABLE CREATED: /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/results/1
3.mgm4754670.3.orftable
============

ESC[34m[36 minutes, 16 seconds]: STEP14 -> BINNING: 14.runbinning.pl
ESC[0mError running command:    perl /opt/SqueezeMeta-1.5.2/bin/MaxBin/run_MaxBin.pl -thread 30 -contig /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/temp/bincontigs.fasta -abund_list /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/maxbin/abund.list -out /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/maxbin/maxbin -markerpath /opt/SqueezeMeta-1.5.2/db/marker.hmm at /opt/SqueezeMeta-1.5.2/lib/SqueezeMeta/bin_maxbin.pl line 132.
wc: /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/maxbin/: Is a directory
wc: /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/metabat2/: Is a directory

Note that that error started in my second set of samples, a previous set ran just fine and got a lot of bins.also I ran both sets without modifying the code .

fpusan commented 2 years ago

What is the output of perl /opt/SqueezeMeta-1.5.2/bin/MaxBin/run_MaxBin.pl -thread 30 -contig /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/temp/bincontigs.fasta -abund_list /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/maxbin/abund.list -out /srv/data/my_shared_data_folder/amrcattle/metagenome_analysis_results/mgm4754670.3/intermediate/binners/maxbin/maxbin -markerpath /opt/SqueezeMeta-1.5.2/db/marker.hmm ?