motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
147 stars 25 forks source link

How to match subset fastq to motus? #60

Closed zckoo007 closed 2 years ago

zckoo007 commented 4 years ago

Hi Alessio,

I have some peptides that will cause immune response,like "IDIKKYARVEKIPGG","GPLLICYKESVDQRGNQIMS"

I want to know whether patient’s intestinal flora also rich in these peptides

I diamond blastx my metagenomes reads to these peptides and find some significant differences between case and control.

Now I want to know, these peptides from which species or genus

My idea is extract these matched reads(evalue <1e-3), then mapping these reads to motus.

but I don't know how to match these subset fastq to motus。

my subset fastq is not pair,which command should I use?

motus profile -f for_sample.fastq -r rev_sample.fastq -s no_pair.fastq -t 6 > taxonomy_profile.txt
AlessioMilanese commented 4 years ago

Hi @zckoo007 ,

To profile unpaired reads, you can just use the -s command like:

motus profile -s no_pair.fastq -t 6 > taxonomy_profile.txt

Let me know if this works.

zckoo007 commented 4 years ago

Hi Alessio,

I hava a subset fastq from 500 other samples, I extract the reads that I am interesting.

-rw-r--r-- 1 ckzhu sample_lib  50M Oct  5 15:33 Num10134_R2.fq
-rw-r--r-- 1 ckzhu sample_lib  50M Oct  5 15:51 Num10134_R1.fq
-rw-r--r-- 1 ckzhu sample_lib 100M Oct  5 16:34 Num10134.fq

motus profile -s Num10134.fq -c -t 28 > Num10134_taxonomy.txt

[main]  MAP_TAX -----------
[main] Number of detected lanes: 1
[main] Run bwa on lane 1
 [map_db](map single reads) 6.15 sec
[main] Total time map_tax: 6.18 sec
[main]  CALC_MGC -----------
[main] Minimum alignment length: 75 (average read length: 100)
 [calc_mgc](parse 1 sam/bam file) 2.73 sec
 [calc_mgc](get mgc abundances) 0.00 sec
[main] Total time calc_mgc: 2.97 sec
[main]  CALC_MOTU -----------
 [calc_motu] (Create taxonomy profile) 0.49 sec
[main]  TOTAL TIME (map_tax+calc_mgc+calc_motu): 9.64 sec

motus profile -f Num10134_R1.fq -r Num10134_R2.fq -c -t 28 > Num10134_taxonomy.txt

[main]  MAP_TAX -----------
[main] Number of detected lanes: 1
[main] Run bwa on lane 1
 [map_db](map forward reads) 6.87 sec
 [map_db](map reverse reads) 4.46 sec
 [map_db](sort reads) 0.00 sec
[main] Total time map_tax: 11.41 sec
[main]  CALC_MGC -----------
[main] Minimum alignment length: 75 (average read length: 100)
 [calc_mgc](parse 1 sam/bam file) 3.22 sec
 [calc_mgc](get mgc abundances) 0.00 sec
[main] Total time calc_mgc: 3.38 sec
[main]  CALC_MOTU -----------
 [calc_motu] (Create taxonomy profile) 0.90 sec
[main]  TOTAL TIME (map_tax+calc_mgc+calc_motu): 15.69 sec

There is no reads mapping。

$ grep -v '^#' Num10134_taxonomy.txt |awk -F "\t" '{print $2}'|sort -u
0

Does the fastq too small to mapping to motus database?

I want to know which bacteria have these reads, How can I do it?

Best wishes Chengkai

AlessioMilanese commented 4 years ago

Hi Chengkai,

I think there are some reads that are mapping.

I would run using the following commands:

For example:

motus profile -s Num10134.fq -c -t 28 -g 1  -y insert.raw_counts -A > Num10134_taxonomy.txt