jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
348 stars 81 forks source link

sqm_reads.pl proceed paired-end reads separately #676

Closed Aaronwang1013 closed 1 year ago

Aaronwang1013 commented 1 year ago

Dear developers,

Our group are currently work on a metatranscriptome project with 9 samples. The script is: SqueezeMeta.pl -m merged -p HS22834_35_merged -s sample -f /mnt/NFS/pisces/aaron/metagenomics/HS22834_35/metatranscriptome/results/02-Data/fastp -t 40 HS22834_35 --nobins -c 200 -a rnaspades After finished the SqueezeMeta pipeline, it came out with a low mapping percentages, as follow:

Sample Total reads Mapped reads Mapping perc Total bases 3 32013586 7926038 24.76 4830981091 2 28631100 4987720 17.42 4319977431 9 35518582 9055074 25.49 5360018037 7 34614144 7499372 21.67 5223156987 6 31626122 5316292 16.81 4771975849 4 35690604 8762478 24.55 5384990215 8 39972160 8401404 21.02 6031652329 5 42045178 11066262 26.32 6344545338 1 33144990 5886418 17.76 5000728669

Therefore, we follow the suggestion in manual and try to perform the taxonomic and functional assignments directly on the reads with sqm_readsl.pl program.

Here is the script: sqm_reads.pl -p HS22834_35_reads -f /mnt/NFS/pisces/aaron/metagenomics/HS22834_35/metatranscriptome/results/02-Data/fastp -s sample

the sample file: 3 3_R2.trim.fastq.gz pair2 3 3_R1.trim.fastq.gz pair1 2 2_R2.trim.fastq.gz pair2 2 2_R1.trim.fastq.gz pair1 9 9_R2.trim.fastq.gz pair2 9 9_R1.trim.fastq.gz pair1 7 7_R1.trim.fastq.gz pair1 7 7_R2.trim.fastq.gz pair2 6 6_R2.trim.fastq.gz pair2 6 6_R1.trim.fastq.gz pair1 4 4_R1.trim.fastq.gz pair1 4 4_R2.trim.fastq.gz pair2 8 8_R1.trim.fastq.gz pair1 8 8_R2.trim.fastq.gz pair2 5 5_R2.trim.fastq.gz pair2 5 5_R1.trim.fastq.gz pair1 1 1_R1.trim.fastq.gz pair1 1 1_R2.trim.fastq.gz pair2

I have two following questions, first, is there any suggestion on the SqueezeMeta program with metatranscriptome project and showed relatively low mapping percentage? Or is there anything wrong with my script for proceeding metatranscriptome case? second, the sqm_reads.pl is now proceeding, but I realized that the program perform the taxonomy assignment on R1 and R2 reads separately, that is, I have two taxonomy and functional assignment results on one sample. Is this a normal situation? or I supposed to merge the pair-end reads first?

Thank a lot!

jtamames commented 1 year ago

Hello sqm_reads.pl should suit you well for a metatranscriptomics project. You can also use sqm_longreads.pl if your reads are longer. The difference is that sqm_reads expects just one gene per read, while sqm_longreads considers that there could be more than one gene in the same read. Therefore, this is more precise but also takes longer. Both pairs are analyzed separately, as you see. That is the intended behaviour. Being a metatranscriptome, both reads should belong to the same genes unless it is a polycistronic read coming from, for instance, an operon