peterjc / galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.
76 stars 69 forks source link

Use Galaxy Data Tables XML for accessing *.loc files #52

Closed peterjc closed 9 years ago

peterjc commented 9 years ago

See discussion on #22, from where I have copied most of the description here.

Should we start using Data Tables to access the BLAST database *.loc files? See https://wiki.galaxyproject.org/Admin/Tools/Data%20Tables

i.e. Replace this in our macros file:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_file="blastdb.loc">
            <column name="value" index="0"/>
            <column name="name" index="1"/>
            <column name="path" index="2"/>
        </options>
    </param>

With the shorter:

    <param name="database" type="select" label="Nucleotide BLAST database">
        <options from_data_table="blastdb" />
    </param>

The column information is then instead defined via tool-data/tool_data_table_conf.xml.sample:

    <table name="blastdb" comment_char="#">
        <columns>value, name, path</columns>
        <file path="tool-data/blastdb.loc" />
    </table>

Quoting @blankenberg on the Galaxy mailing list:

Having a standalone repository that just contained the tool data table and .loc file
that could be a dependency of other repositories would be a good way to go here.
Unfortunately, this isn’t supported right now. I’ve opened a trello card for this:
https://trello.com/c/VZxV08Qt

However, even though you currently need to include the tool data table definition
and .loc sample in each repository in order for the tool to be valid, it is still
a best practice to use tool data tables.

See http://lists.bx.psu.edu/pipermail/galaxy-dev/2014-April/019027.html - and the Trello Issue https://trello.com/c/VZxV08Qt which says:

Currently a repository with a tool that requires a tool data table must have that
tool data table included within its own repository. This causes duplication of this
files in each repository that needs them.

We could allow a repository having (just?) the data table definition and the
.loc.sample to be a dependency of other repositories.

Bonus points if we were to allow optional namespacing of the table name based
upon its repos (since there is currently a possibility of name collisions).

It looks like we could have identical copies of the tool-data/tool_data_table_conf.xml.sample and the tool-data/*.loc.sample files included in multiple ToolShed repositories. They should be almost static so version clashes should not be a problem?

See http://lists.bx.psu.edu/pipermail/galaxy-dev/2014-April/019023.html / http://dev.list.galaxyproject.org/Data-Tables-and-loc-files-Using-named-columns-versus-from-data-table-tc4664149.html and the Trello Card on Galaxy: https://trello.com/c/VZxV08Qt

abretaud commented 9 years ago

Thanks for filling this issue! I don't think having identical copies of these quite static files would be a problem. Tell me if you want me to prepare a pull request

peterjc commented 9 years ago

Does that work for you @abretaud? Assuming there are no obvious problems this will eventually reach the Test Tool Shed, then main Tool Shed.

abretaud commented 9 years ago

Yes, it looks perfect! I will retest it once it is in Test Tool Shed, just to be sure, but it should be ok Thanks!

peterjc commented 9 years ago

Great. Do you need this soon? i.e. Would a release in the next week or so be ideal?

I'm currently trying to take advantage of this work and other updates in the Galaxy test framework for #53 ...

abretaud commented 9 years ago

It's not very urgent, though I'm waiting for it to be released before releasing the biomaj code I have written. In fact I'm on holidays at the end of next week until January 5th, so if you have time to release it next week it's great, otherwise it's not a big problem, it will wait until january