peterjc / galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.
75 stars 69 forks source link

Support for -remote #39

Open peterjc opened 10 years ago

peterjc commented 10 years ago

Filing an overdue issue for this previously discussed enhancement. See early work by @jj-umn as part of Galaxy-P on this branch (checked in by @jmchilton): https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using -remote makes several 'new' options available including -entrez_query which can be used to filter by taxonomy etc, but also removes other options.

Given the number of options which change, and the concerns about potential abuse of the NCBI servers (which could lead to entire Galaxy instances being black listed), my preference is for a sister set of tools. i.e. We'd have the current (local) BLASTP as one tool, and a new sister tool for remote BLASTP (run at the NCBI).

jj-umn commented 10 years ago

I agree that the remote options should be in separate tools. We should be able to maintain consistency by using macros for common sections.

Thanks,

JJ

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

peterjc commented 10 years ago

Do you agree the remote wrappers (which will be a subset since all the database and masking tools won't apply) should be a separate suite on the ToolShed?

If we are careful they can use the same ncbi_macros.xml file (via a symlink if the remote tools get a separate folder under git).

jj-umn commented 10 years ago

On 3/19/14, 9:39 AM, Peter Cock wrote:

Do you agree the remote wrappers (which will be a subset since all the database and masking tools won't apply) should be a separate suite on the ToolShed?

If we are careful they can use the same |ncbi_macros.xml| file (via a symlink if the remote tools get a separate folder under git).

— Reply to this email directly or view it on GitHub https://github.com/peterjc/galaxy_blast/issues/39#issuecomment-38057736.

That sounds workable.

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

bgruening commented 10 years ago

I'm also for a separate repository. @jj-umn how do you control the max. number of requests to the NCBI Server? I'm a little bit worried if the effort is worth the results. Are there so many users that are not able to setup there own blast database?

jj-umn commented 10 years ago

Pratik is the researcher from whom the remote blastp was developed. If I remember correctly, the primary reason for the remote option was to get blast results for particular organisms in order to search for novel proteins.

Pratik, is this functionality still needed?

Thanks,

JJ

On 3/19/14, 4:57 AM, Peter Cock wrote:

Filing an overdue issue for this previously discussed enhancement. See early work by @jj-umn https://github.com/jj-umn as part of Galaxy-P on this branch (checked in my @jmchilton https://github.com/jmchilton): https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using |-remote| makes several 'new' options available including |-entrez_query| which can be used to filter by taxonomy etc, but also removes other options.

Given the number of options which change, and the concerns about potential abuse of the NCBI servers (which could lead to entire Galaxy instances being black listed), my preference is for a sister set of tools. i.e. We'd have the current (local) BLASTP as one tool, and a new sister tool for remote BLASTP (run at the NCBI).

— Reply to this email directly or view it on GitHub https://github.com/peterjc/galaxy_blast/issues/39.

On 3/19/14, 11:30 AM, Björn Grüning wrote:

I'm also for a separate repository. @jj-umn https://github.com/jj-umn how do you control the max. number of requests to the NCBI Server? I'm a little bit worried if the effort is worth the results. Are there so many users that are not able to setup there own blast database?

— Reply to this email directly or view it on GitHub https://github.com/peterjc/galaxy_blast/issues/39#issuecomment-38073298.

Using |-remote| makes several 'new' options available including |-entrez_query| which can be used to filter by taxonomy etc, but also removes other options.

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

jj-umn commented 10 years ago

Hello JJ,

Yes - we need this tool. I do use it within Galaxy-P for BLAST searches for proteogenomics work and metaproteomics work.

Thanks,

Pratik

Pratik Jagtap, Managing Director, Center for Mass Spectrometry and Proteomics, 43 Gortner Laboratory 1479 Gortner Avenue St. Paul, MN 55108 Phone: 612-624-9275

On Wed, Mar 19, 2014 at 3:49 PM, Jim Johnson johns198@umn.edu wrote:

Pratik is the researcher from whom the remote blastp was developed. If I remember correctly, the primary reason for the remote option was to get blast results for particular organisms in order to search for novel proteins.

Pratik, is this functionality still needed?

Thanks,

JJ

On 3/19/14, 4:57 AM, Peter Cock wrote:

Filing an overdue issue for this previously discussed enhancement. See early work by @jj-umn https://github.com/jj-umn as part of Galaxy-P on this branch (checked in my @jmchilton https://github.com/jmchilton): https://bitbucket.org/galaxyp/galaxyp-toolshed-blast/commits/branch/default

Using -remote makes several 'new' options available including -entrez_query which can be used to filter by taxonomy etc, but also removes other options.

Given the number of options which change, and the concerns about potential abuse of the NCBI servers (which could lead to entire Galaxy instances being black listed), my preference is for a sister set of tools. i.e. We'd have the current (local) BLASTP as one tool, and a new sister tool for remote BLASTP (run at the NCBI).

Reply to this email directly or view it on GitHubhttps://github.com/peterjc/galaxy_blast/issues/39 .

On 3/19/14, 11:30 AM, Björn Grüning wrote:

I'm also for a separate repository. @jj-umn https://github.com/jj-umnhow do you control the max. number of requests to the NCBI Server? I'm a little bit worried if the effort is worth the results. Are there so many users that are not able to setup there own blast database?

Reply to this email directly or view it on GitHubhttps://github.com/peterjc/galaxy_blast/issues/39#issuecomment-38073298 .

Using -remote makes several 'new' options available including -entrez_query which can be used to filter by taxonomy etc, but also removes other options.

James E. Johnson, Minnesota Supercomputing Institute, University of Minnesota

peterjc commented 10 years ago

Thanks @jj-umn - good to know there is a clear motivation for this, and the -entrez_query feature in particular.

I do appreciate that species filtering is an important use case, and it seems like there ought to be a neat way to do this with a local database (other than using the tabular output and filtering on the taxonomy as a post-processing step). Note we've got issue #36 for filtering by taxonomy, which may be possible for blastn (only) via -window_masker_taxid. But that isn't very general.

We typically solve/avoid this with custom organism specific BLAST databases, often for draft genomes which have not yet been published. As another example I used Entrez to build a complete virus database http://blastedbio.blogspot.co.uk/2013/11/entrez-trouble-with-chimeras.html

peterjc commented 9 years ago

See also https://toolshed.g2.bx.psu.edu/view/galaxyp/blast_plus_remote_blastp