xinehc / args_oap

ARGs-OAP: Online Analysis Pipeline for Antibiotic Resistance Genes Detection from Metagenomic Data Using an Integrated Structured ARG Database
MIT License
40 stars 11 forks source link

Is it reasonable to use diamond(blastx) instead of blast(blastx) in the ARG-OAP? #41

Open jiangyoufeng opened 1 year ago

jiangyoufeng commented 1 year ago

Hi,

When my data volume is large, I want to accelerate the alignment process. So I want to ask is it reasonable to use diamond(blastx) instead of blast(blastx) in ARG-OAP ?For example, is it reasonable if I change the codes in the .py to make the alignment done by diamond?

Thanks!

xinehc commented 1 year ago

yes, you can replace blastx with diamond's blastx in stage_two. the results should not be largely different if you use the same cutoff.

jiangyoufeng commented 1 year ago

Thanks for your reply!

diego00012138 commented 8 months ago

Hi! I am interested in this replacememt, but unfortunately I am not familiar with python, I would appreciate it if you could give me a specific modification Sincerely

diego00012138 commented 8 months ago

Hi! Am I right to edit the code like below 1)only change the corresponding parameters in this area and do nothing to the else.

0205_1 2)exchange the stage_two.py of these directions 0205_2 3)install the diamond to the environment Sincerely

xinehc commented 8 months ago

Yes you only need to replace subprocess.run([...]) with your diamond arguments. Please note the name of arguments may not be exactly identical, e.g. -mt_mode is not used by diamond.

diego00012138 commented 8 months ago

Hi After some attempts, it didn't work. First,I changed subprocess.run([...]) to this:

__def extract_seqs(self): ''' Extract target sequences using more stringent cutoffs & blast. ''' logger.info(f'Processing <{self.setting.extracted}> ...') nbps, nlines = simple_count(self.setting.extracted) blast_mode = 'blastx' if self.dbtype == 'prot' else 'blastn'

    logger.info('Extracting target sequences using BLAST ...')
    logger.info(f'BLAST settings: {nbps} bps, {nlines} reads, {self.thread} threads')

    subprocess.run([
        'diamond',
        blast_mode,
        '--db', self.db,
        '-q', self.setting.extracted,
        '-o', self.setting.blastout,
        '--outfmt', ' '.join(['6'] + self.setting.columns),
        '-e', str(self.e),
        '--max-target-seqs', '5',
        '-p', str(self.thread)])

but I get this error:"Error: Invalid output format: 6 qseqid sseqid pident length qlen slen evalue bitscore"

I can not figure it out SO I just delete the"'--outfmt', ' '.join(['6'] + self.setting.columns)," to find out could it run.

But I get another error :"Opening the database... Error: This executable was not compiled with support for BLAST databases.

xinehc commented 8 months ago

try this one:

subprocess.run([
    'diamond',
    blast_mode,
    '--db', self.db + '.dmnd',
    '-q', self.setting.extracted,
    '-o', self.setting.blastout,
    '--outfmt', '6'] + self.setting.columns + [
    '-e', str(self.e),
    '--max-target-seqs', '5',
    '-p', str(self.thread)])