usegalaxy-no / galaxyadmin

A repository for managing the work of the usegalaxy.no GalaxyAdmin team
0 stars 0 forks source link

Bracken fails #51

Open ehj000 opened 3 years ago

ehj000 commented 3 years ago

The tool Bracken fails with the following error:

Checking report file: /data/part0/000/855/dataset_855987.dat Traceback (most recent call last): File "/usr/local/bin/est_abundance.py", line 529, in main() File "/usr/local/bin/est_abundance.py", line 315, in main k_file = open(args.kmer_distr,'r') FileNotFoundError: [Errno 2] No such file or directory: '2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib'

Bracken (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0) and Bracken database builder (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/bracken_build_database/2.5+galaxy0) was installed individually from the toolshed.

Building a database with Bracken database builder went well. This was named "Bracken_standard_75mer_distrib_read_length100", and it is possible to select this database when using Bracken. However, from the error log it seems that Bracken in searching for a different database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib"

Any suggestions to what can be causing this?

torfinnnome commented 3 years ago

Could you share an input file for Bracken?

Maybe we should discuss if we want a shared "Tools debugging" Galaxy account, to make debugging issues faster? Direct access to a history with a failed run would lower the mental barrier to start digging into this...

torfinnnome commented 3 years ago

Not obvious to me what is wrong, but I'm far from a Galaxy data manager/library expert. Would be great if someone with more experience would also take a look.

But that name you see in the error logs corresponds to the "Standard" Kraken database, which you based the Bracken database on, I assume?

ehj000 commented 3 years ago

I build the Bracken database using Bracken Database Builder (Admin -> Local Data), with the same database that was built/downloaded for Kraken2 (Standard). I named the Bracken database "Bracken_standard_75mers_distrib_read_length100".

This database is available when I select the Bracken tool, but from the error report, the tool seem to want another database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib". Could this be hard coded in "est_abundance.py"?

kjetilkl commented 3 years ago

"Bracken_standard_75mers_distrib_read_length100" is the display name you chose for the database, which is shown to the users. But the actual file name will be "databaseNmers.kmer_distrib" where N is the read length specified when creating the database. This file is placed in a subdirectory named after the chosen Kraken database (the full name of the Standard database here is "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7"), so the error report does indeed reference the database file you created. The problem, I believe, is that the path to this file is relative (as seen in the last column of the loc-file: "/srv/galaxy/server/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/fd5830f88314/bracken_databases.loc"). The files created by data managers should normally be placed somewhere beneath the path specified with the "galaxy_data_manager_data_path" setting in "galaxy.yml" (defaults to the same value as "tool_data_path"), but I have not been able to find the actual location of the database file yet. I remember briefly discussing some time ago where we should place such files (or maybe add them to CVMFS), but we did not conclude on anything, which may be why these settings have not been explicitly configured.

kjetilkl commented 3 years ago

I also suspect that the Bracken data manager tool by IUC may be to blame, since it does not actually move the generated files into a subdirectory of ${GALAXY_DATA_MANAGER_DATA_PATH} (which is something all the other data managers I have looked at do).

Here is the "data_manager_conf.xml" file for Bracken. It will output a line with 3 columns to the loc-file: a unique ID (value) for the reference dataset, a name displayed to the users and the path to the file(s). The path here is just the relative location of the file within the job working directory, so it is probably just deleted when the data manager job is finished.

<data_managers>
    <data_manager tool_file="data_manager/bracken_build_database.xml" id="bracken_build_database" version="2.5+galaxy0">
        <data_table name="bracken_databases">
            <output>
                <column name="value"/>
                <column name="name"/>
                <column name="path" output_ref="out_file"/>
            </output>
        </data_table>
    </data_manager>
</data_managers>

Below is an example of a typical "data_manager_conf.xml" file (here HISAT2). The files are moved to a different location outside of the job working directory, and the path is translated to point to this new location (which is an absolute path).

<?xml version="1.0"?>
<data_managers>
    <data_manager tool_file="data_manager/hisat2_index_builder.xml" id="hisat2_index_builder" version="0.0.1">
        <data_table name="hisat2_indexes">
            <output>
                <column name="value" />
                <column name="dbkey" />
                <column name="name" />
                <column name="path" output_ref="out_file" >
                    <move type="directory" relativize_symlinks="True">
                        <!-- <source>${path}</source>--> <!-- out_file.extra_files_path is used as base by default --> <!-- if no source, eg for type=directory, then refers to base -->
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">${dbkey}/hisat2_index/${value}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${dbkey}/hisat2_index/${value}/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>
</data_managers>
torfinnnome commented 3 years ago

IUC would probably love a PR on this.