Closed Ash1One closed 4 years ago
The protein mode of Abricate is undocumented and should not be used.
It is doing the search the wrong way around. It will never finish running. The proper way it to use tblastn
of DeepARG to contigs, not blastx
of contigs to DeepARG.
Does DeepARG really have more true gene families than --db ncbi
?
Or just more minor alleles?
Does DeepARG gave it's own annotation tool? I see a diamond
database in their repo.
Thank you for your reply. @tseemann As you say,
The proper way it to use tblastn of DeepARG to contigs, not blastx of contigs to DeepARG.
From your advice, I have recognized that blast is a local alighment tool so it is appropriate that blast
shorter sequences to a database that consisit of longer sequences.
But I also have read abricate
code:
111 my $blastcmd = $dbinfo->{DBTYPE} eq 'nucl'
112 ? "blastn -task blastn -dust no -perc_identity $minid"
113 : "blastx -task blastx-fast -seg no"
114 ;
115
116 my $cmd = "(any2fasta -q -u \Q$file\E |"
117 . " $blastcmd -db \Q$db_path\E -outfmt '$format' -num_threads $threads"
118 . " -evalue 1E-20 -culling_limit $CULL"
119 # . " -max_target_seqs ".$dbinfo->{SEQUENCES} # Issue #76
120 . ") 2>&1"
121 ;
and I was confused by it. should abricate blast
CARD or NCBI sequences to contigs that is more longer than normal antibiotic resistance genes ?
By the way, in serveral papers I have read, I found that it usually use Prodigal or MetaGeneMark to predict Open Read Frame from assembly contigs file, then blastx
ORF to card or DeepARG database. I have no idea whether tblastn
ARGs to ORF or blast
ORF to ARGs is a proper way to identify AR-like genes as it is unable to determine one of them is longer than the other.
Looking forward to your reply.😀
In my opinion, relying on an ORF/gene predictor, then using BLASTP, is a bad idea. You could miss important AMR genes due to assembly issues, or bad RBS/promoter. Best to scan directly against the contigs.
I use the special -culling_limit
option to ensure only the "best hit" in any region is returned. This avoids getting 800 betalactamase hits all to the same part of the contig.
The local alignment property means it works either way, long vs short or short vs long. if it was glocal (like glsearch36) then you need to put the short as the query.
If you already have ORFs, then you should translate them, and do BLASTP (protein : protein) against the DeepARG or CARD proteins.
Do you have contigs or genes/ORFs ?
In my opinion, relying on an ORF/gene predictor, then using BLASTP, is a bad idea. You could miss important AMR genes due to assembly issues, or bad RBS/promoter. Best to scan directly against the contigs.
I use the special
-culling_limit
option to ensure only the "best hit" in any region is returned. This avoids getting 800 betalactamase hits all to the same part of the contig.The local alignment property means it works either way, long vs short or short vs long. if it was glocal (like glsearch36) then you need to put the short as the query.
If you already have ORFs, then you should translate them, and do BLASTP (protein : protein) against the DeepARG or CARD proteins.
Do you have contigs or genes/ORFs ?
Yes, I already have ORFs. I would do BLASTP against the DeepARG database as you advice. Thanks for your patiant explanation. :smiley: @tseemann
You are welcome. And good luck with your search :)
Hello, @tseemann I have bulit the DeepARGs from https://bitbucket.org/gusphdproj/deeparg-ss/src/master/database/ and it have nearly 12000 sequences. I abricate my metegonome assembly ORFs file to card database and it took dozens of minutes. But when I abricate ORFs file to the DeepARGs database, it has already took more than 20 hours and still not end.
I would like to know it just because blastx is slow or I made something wrong?
Thanks!