pirovc / metameta

Other
23 stars 10 forks source link

preinstall databases? (and install them somewhere specifically?) #12

Closed draeath closed 6 years ago

draeath commented 6 years ago

I see in the README that metameta will download the databases on first run.

I am preparing the software in an environment where the users do not have privileges to write where it will be installed. I don't have what's needed to run the tool, so I need some means to kick off this download manually.

I'd also like to know where these get stashed away, because they seem quite large. Is one able to place them elsewhere, and inform metameta of their location?

I've attempted to puzzle out how this works in the source, but I can't make heads or tails of it.

pirovc commented 6 years ago

You can solve that by running a dummy run with the sample data and a pre-configured database.

The database directory is defined with the dbdir variable on the configuration file. The database files will be downloaded in the defined directory. In further executions, if the same path in dbdir and same database were selected in the configuration file, metameta will figure that out and just run the tools.

Best Vitor

draeath commented 6 years ago

Thank you!

I think I've got it sorted. It's going to take some time, obviously.

I did catch an error in the snakemake output fairly early. Is this indicative of something being wrong, or is this OK to occur with the sample data?

rule motus_run_1:
    input: sample_data_archaea_bacteria/reads/motus.1.fq, /shares/hii/bioinfo/data/ref/MetaMeta/archaea_bacteria_201503/motus_db_check.done
    output: sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz
    log: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log
    jobid: 18
    benchmark: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.time
    wildcards: sample=sample_data_archaea_bacteria, database=archaea_bacteria_201503

Error in rule motus_run_1:
    jobid: 18
    output: sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz
    log: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log

RuleException:
CalledProcessError in line 12 of /shares/hii/sw/MetaMeta/1.2.0/opt/metameta/tools/motus.sm:
Command ' set -euo pipefail;  
                cd sample_data_archaea_bacteria/motus/archaea_bacteria_201503/
                mOTUs.pl --processors=1 ../../../sample_data_archaea_bacteria/reads/motus.1.fq ../../../sample_data_archaea_bacteria/reads/motus.2.fq --output-directory ./ > ../../../sample_
data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log 2>&1
                mv NCBI.species.abundances.gz ../../../sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz ' returned non-zero exit 
status 2.
  File "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/tools/motus.sm", line 12, in __rule_motus_run_1
  File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Finished job 5.
7 of 45 steps (16%) done

The contents of sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log:

ReadTrimFilter_aux : SAMPLE motus.processing.1
ReadTrimFilter_aux : SCRIPT VERSION 3
ReadTrimFilter_aux : CLEANUP : rm -f ./motus.processing.1/temp/*.trimmed.filtered.fq*; mkdir -p /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/stats; rm -fr /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa; mkdir -p /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa
ReadTrimFilter_aux : GET QUALITY STATS : mode=solexaqa sanger=-Q 33 format=sanger sample=lane1.1.fq : EXECUTE cat /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.1.fq | /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastx_quality_stats -o ./motus.processing.1/temp/lane1.1.fq.qual_stats.temp -Q 33
ReadTrimFilter_aux : Get 3 prime end
ReadTrimFilter_aux : GET QUALITY STATS : mode=solexaqa sanger=-Q 33 format=sanger sample=lane1.2.fq : EXECUTE cat /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.2.fq | /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastx_quality_stats -o ./motus.processing.1/temp/lane1.2.fq.qual_stats.temp -Q 33
ReadTrimFilter_aux : Get 3 prime end
ReadTrimFilter_aux : PROCESS PAIR-END lane1 : EXECUTE /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastq_trim_filter_v5_EMBL -m solexaqa -a /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.1.fq -b /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.2.fq -f 2 -2 1 -q 20 -l 45 -p 50 -Q 33 -o /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa/lane1
ReadTrimFilter_aux : SUMMARIZE STATS and write /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/stats/motus.processing.1.readtrimfilter.solexaqa.stats
ReadTrimFilter_aux : COMPLETED READ TRIM FILTER
ParseFASTToPacked() : Cannot open annotationFileName!
Parsing FASTA file..
Command '/shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/2bwt-builder /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/data/mOTU.v1.padded' failed: No such file or directory at /shares/hii/sw/MetaMeta/1.2.0/bin/mOTUs.pl line 83, <LANE> line 8.

Note: i'm a sysadmin, and not one of the researchers. I'm afraid much of this is unfamiliar to me. I can't make sense of that last line of output: No such file or directory at /shares/hii/sw/MetaMeta/1.2.0/bin/mOTUs.pl line 83, <LANE> line 8.

<LANE> doesn't seem to be a reference to a file, and mOTUs.pl around there looks pretty sane (though I don't read perl):

sub system_ {
    my $cmd = shift;
    if ($verbose) {
        print "Will execute \n\n$cmd\n\n";
    }   
    (system($cmd) == 0) or die("Command '$cmd' failed: $!");
}
draeath commented 6 years ago

The config file is as follows:

# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"

samples:
  sample_data_archaea_bacteria:
     fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.1.fq.gz"
     fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.2.fq.gz"

gzipped: 1
threads: 1

I'm waiting to see if this one finishes OK before running the viral one. Similar modifications were made (mostly using absolute paths)

draeath commented 6 years ago

Is this database file healthy?

It's only 21kb, and contains only:

$ tar tf ~/Downloads/motus_bac_arc_v1.tar.gz 
motus_db/
motus.dbprofile.out

The motus.dbprofile.out contains lines like this. All 0 at the end, if that's significant:

superkingdom    Archaea 0
superkingdom    Bacteria        0
phylum  Acidobacteria   0
phylum  Actinobacteria  0
phylum  Aquificae       0
phylum  Bacteroidetes   0
phylum  Chlamydiae      0

Update: the referenced file from the zenodo.org entry description here is 47M. It's a perl script with some embedded data, so not sure what I can do with this.

pirovc commented 6 years ago

The problem here is with the installation of mOTUs. Are you pre-installing it or using the snakemake integration with BioConda --use-conda? The error msg says that mOTUs couldn't find its own database apparently.

Besides that the motu data from zenodo is fine. The database comes with the perl (and it is indeed very small) and it's unpacked when you install the motus tool. Those zeros are just placeholders since the mOTUs database is a bit different from the rest.

draeath commented 6 years ago

I installed these packages/versions in the same conda environment that I installed MetaMeta. Should that have worked? https://raw.githubusercontent.com/pirovc/metameta/master/envs/metameta_complete.yaml

What should motus look like inside conda, for me to know it's operable?

draeath commented 6 years ago

I'm wondering if it was a permissions issue with motus_data. I'd like to avoid having any contents of the conda environment be writable, if that's even possible here.

I see the following:

[pabransford@hii motus]$ tree
.
├── motus_data
│   ├── bin
│   │   ├── 2bwt-builder
│   │   ├── fastq_trim_filter_v5_EMBL
│   │   ├── fastq_trim_filter_v5_OSX -> fastq_trim_filter_v5_EMBL
│   │   ├── fastx_quality_stats
│   │   ├── msamtools
│   │   ├── samtools
│   │   └── soap2.21
│   ├── data
│   │   ├── mOTU-LG.v1.annotations.txt
│   │   ├── mOTU.v1.map.txt
│   │   ├── mOTU.v1.padded
│   │   ├── mOTU.v1.padded.coord
│   │   ├── mOTU.v1.padded.len
│   │   ├── mOTU.v1.padded.motu.linkage.map
│   │   ├── mOTU.v1.padded.motu.map
│   │   ├── RefMG.v1.padded
│   │   ├── RefMG.v1.padded.coord
│   │   ├── RefMG.v1.padded.len
│   │   ├── RefMG.v1.padded.refmg.map
│   │   └── RefMG.v1.padded.refmg.map.total_length
│   └── src
│       ├── MOCATCalculateTaxonomy.pl
│       ├── MOCATExternal_soap2sam.pl
│       ├── MOCATFilter_falen.pl
│       ├── MOCATFilter_filterPE.pl
│       ├── MOCATFilter_remove_in_padded.pl
│       ├── MOCATFilter_soap2sam.awk
│       ├── MOCATFilter_stats.pl
│       ├── MOCATFraction.pl
│       ├── MOCATPasteTaxonomyCoverageFiles_generate_mOTU_tables.pl
│       ├── MOCATReadTrimFilter_aux.pl
│       └── MOCATScreen_filter.pl
└── mOTUs.pl

But I'm not sure if that's complete. The motus_data/data directory is about 153M.

pirovc commented 6 years ago

Looks like the installation is correct. Is this folder inside ~/miniconda3/opt/motus/ (or miniconda3/envs/env_name/opt/motus)? This is the folder mOTUs works, besides that there's a symbolic link to the mOTUs.pl script to bin/ . I'm not sure if the directory needs to be writable, but the contents on motus_data/bin/ should be executable at least.

draeath commented 6 years ago

It's outside of ${HOME} and owned by root (we're installing this all into a GPFS share, for cluster nodes to execute from, for context).

The files in ./opt/motus/motus_data/bin/ are indeed mode 0755

I'm going to try moving this all aside and recreating the process, but leaving it all owned by my user through a test drive. If it works, we'll be able to figure out what's different between the two trees and maybe figure out a way to document this for anyone else following in my footsteps :)

If it doesn't work that way either, well, things get interesting.

I'll let you know which case it was.

Thank you very much for your support on this thus far. It is very much appreciated!

draeath commented 6 years ago

Huh, getting something entirely different now. Conda is having dependency issues. I started over from scratch, so a fresh copy of Miniconda3-latest-Linux-x86_64.sh (version 4.3.31) and metameta_complete.yaml

I tried both just installing metameta, and with your environment yaml.

[pabransford@hii MetaMeta]$ pwd
/shares/hii/sw/MetaMeta

[pabransford@hii MetaMeta]$ echo $PATH
/shares/hii/sw/MetaMeta/1.2.0/bin:{REDACTED}

[pabransford@hii MetaMeta]$ conda install -c bioconda metameta=1.2.0
Fetching package metadata .............
Solving package specifications: 

PackageNotFoundError: Packages missing in current channels:

  - metameta 1.2.0* -> snakemake ==4.3.0 -> aioeasywebdav
  - metameta 1.2.0* -> snakemake ==4.3.0 -> ratelimiter

We have searched for the packages in the following channels:

  - https://conda.anaconda.org/bioconda/linux-64
  - https://conda.anaconda.org/bioconda/noarch
  - https://repo.continuum.io/pkgs/main/linux-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/linux-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/linux-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/linux-64
  - https://repo.continuum.io/pkgs/pro/noarch

[pabransford@hii MetaMeta]$ conda env create -f metameta_complete.yaml
Fetching package metadata .............
Solving package specifications: 

ResolvePackageNotFound: 
  - kaiju 1.4.5*
  - perl 5.22.0*
  - metameta 1.2.0*
  - snakemake ==4.3.0
  - aioeasywebdav
  - metameta 1.2.0*
  - snakemake ==4.3.0
  - ratelimiter

If I'm reading it right, it's mad about snakemake? I can see that version is showing up in a search:

[pabransford@hii MetaMeta]$ conda search -c bioconda snakemake
Fetching package metadata .............
snakemake                    3.4.2                    py34_1  bioconda        

    { snipped out a bunch of unrelated versions }

                             4.2.0                    py35_0  bioconda        
                             4.2.0                    py36_0  bioconda        
                             4.3.0                    py35_0  bioconda        
                             4.3.0                    py36_0  bioconda        
                             4.3.1                    py35_0  bioconda        
                             4.3.1                    py36_0  bioconda        
pirovc commented 6 years ago

Strange, I never saw such an error with motus, even installing it beforehand as you're doing. I'll try to replicate the scenario once again.

Regarding conda, they could have change some packages and the bioconda channel by itself is not enough anymore. Can you please add the secondary channels and try again? Execute the following commands in this order:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

That will add those channels in your current conda installation list, so the install command should be conda install metameta=1.2.0

draeath commented 6 years ago

Looks like conda-forge was the missing magic. Seems to be required now.

The following NEW packages will be INSTALLED:

    aioeasywebdav:   2.2.0-py36_0          conda-forge
    aiohttp:         2.0.7-py36_0          conda-forge
    appdirs:         1.4.3-py_0            conda-forge
    async-timeout:   1.2.1-py36_0          conda-forge
    bcrypt:          3.1.4-py36h621fe67_0             
    configargparse:  0.12.0-py36_0         conda-forge
    docutils:        0.14-py36_0           conda-forge
    dropbox:         8.4.0-py_0            conda-forge
    filechunkio:     1.6-py36_0            bioconda   
    ftputil:         3.2-py36_0            bioconda   
    intel-openmp:    2018.0.0-hc7b2577_8              
    libgfortran-ng:  7.2.0-h9f7466a_2                 
    metameta:        1.2.0-1               bioconda   
    mkl:             2018.0.1-h19d6760_4              
    multidict:       2.1.4-py36_0          conda-forge
    numpy:           1.14.0-py36h3dfced4_1            
    pandas:          0.22.0-py36_0         conda-forge
    paramiko:        2.3.1-py_0            conda-forge
    psutil:          5.4.0-py36_0          conda-forge
    pyasn1:          0.4.2-py_0            conda-forge
    pynacl:          1.1.2-py36_0          conda-forge
    pysftp:          0.2.9-py36_0          bioconda   
    python-dateutil: 2.6.1-py36_0          conda-forge
    pytz:            2017.3-py_2           conda-forge
    pyyaml:          3.12-py36_1           conda-forge
    ratelimiter:     1.2.0-py36_0          conda-forge
    snakemake:       4.3.0-py36_0          bioconda   
    wrapt:           1.10.11-py36_0        conda-forge
    yarl:            0.10.0-py36_0         conda-forge

The following packages will be UPDATED:

    conda:           4.3.31-py36_0                     --> 4.3.33-py36_0 conda-forge

The following packages will be SUPERSEDED by a higher-priority channel:

    conda-env:       2.6.0-h36134e3_1                  --> 2.6.0-0       conda-forge

I'll drop an update if I get a working installation or if I have motus problems again :)

draeath commented 6 years ago

I got further along, almost to the end. The node I was working on apparently didn't have enough memory for archaea_bacteria_201503/kraken_db/database.kdb

I'll continue working on this on one of our higher memory nodes, but meanwhile: is there any way to allow it to run in a smaller scale? What do you consider a minimum spec? It blew up with this on a 64gb machine (with ~ 55 free)

Loading database... classify: unable to mmap {snip}/archaea_bacteria_201503/kraken_db//database.kdb: Cannot allocate memory

I see the file on-disk is ~ 66gb.

pirovc commented 6 years ago

Thanks for the heads-up about the channels, I will update the README accordingly. Did mouts work on the separated environment?

kraken is indeed memory hungry, It needs at least 66Gb since it loads the complete database to memory. If you are running on multiple threads (not in a cluster environment) it's even more, because other tools will be running at the same time, sharing memory.

draeath commented 6 years ago

Hello,

I haven't tried it with the write permissions stripped out yet, I'm trying to get a successful run first before even trying that.

I'm failing out further along now! But a quick question before getting in to that. I noticed log files being created in the dbdir. This is undesirable in my case. Is there any way to have that put somewhere else? Example:

MetaMeta finished successfuly
Please check the main log file for more information:
        /tmp/metameta-workdir/metameta_2018-02-05_12-27-52.log
Detailed output and execution time for each rule can be found at:
        /shares/hii/bioinfo/data/ref/MetaMeta/log/
        /tmp/metameta-workdir/SAMPLE_NAME/log/

Now, for the bad news. The sample_data_archaea_bacteria completed without (apparent) error, but sample_data_custom_viral is failing at the metametamerge step:

rule metametamerge:
    input: sample_data_custom_viral/profiles/custom_viral_db/clark_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/dudes_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/kaiju_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/kraken_clean_files.done, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp
    output: sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out
    log: sample_data_custom_viral/log/custom_viral_db/metametamerge.log
    jobid: 3
    benchmark: sample_data_custom_viral/log/custom_viral_db/metametamerge.time
    wildcards: sample=sample_data_custom_viral, database=custom_viral_db

Error in rule metametamerge:
    jobid: 3
    output: sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out
    log: sample_data_custom_viral/log/custom_viral_db/metametamerge.log

RuleException:
CalledProcessError in line 22 of /shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/metametamerge.sm:
Command ' set -euo pipefail;  MetaMetaMerge.py --input-files sample_data_custom_viral/profiles/custom_viral_db/clark.profile.out sample_data_custom_viral/profiles/custom_viral_db/dudes.profile.out sample_data_custom_viral/profiles/custom_viral_db/kaiju.profile.out sample_data_custom_viral/profiles/custom_viral_db/kraken.profile.out --database-profiles /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/dudes.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kaiju.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kraken.dbprofile.out --tool-identifier 'clark,dudes,kaiju,kraken' --tool-method 'b,p,b,b' --names-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp --nodes-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp --merged-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp --bins 4 --cutoff 0.0001 --mode 'linear' --ranks 'species' --output-file sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out   --output-parsed-profiles > sample_data_custom_viral/log/custom_viral_db/metametamerge.log 2>&1 ' returned non-zero exit status 1.
  File "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/metametamerge.sm", line 22, in __rule_metametamerge
  File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.

Here is the content of metameta-workdir/sample_data_custom_viral/log/custom_viral_db/metametamerge.log:

- - - - - - - - - - - - - - - - - - - - -
           MetaMetaMerge 1.1
- - - - - - - - - - - - - - - - - - - - -
Input files: 
 clark (b) sample_data_custom_viral/profiles/custom_viral_db/clark.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out
 dudes (p) sample_data_custom_viral/profiles/custom_viral_db/dudes.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/dudes.dbprofile.out
 kaiju (b) sample_data_custom_viral/profiles/custom_viral_db/kaiju.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kaiju.dbprofile.out
 kraken (b) sample_data_custom_viral/profiles/custom_viral_db/kraken.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kraken.dbprofile.out
Taxonomy: 
 /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp
Bins: 4
Cutoff: 0.0001
Mode: linear
Ranks: species
Output file (type): sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out (bioboxes)
Verbose: False
Detailed: False
- - - - - - - - - - - - - - - - - - - - -

Parsing taxonomy (names, nodes, merged) ... 

Reading database profiles ...
 - /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out (tsv)
        species - 0 entries (0 ignored)
        (WARNING) no valid entries found [species]
Traceback (most recent call last):
  File "/shares/hii/sw/MetaMeta/1.2.0/bin/MetaMetaMerge.py", line 338, in <module>
    main()
  File "/shares/hii/sw/MetaMeta/1.2.0/bin/MetaMetaMerge.py", line 113, in main
    db = Databases(database_file, parse_files(database_file, 'db', all_names_scientific, all_names_other, nodes, merged, ranks, args.verbose), ranks)
  File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/site-packages/metametamerge/Databases.py", line 8, in __init__
    Profile.__init__(self, profile, ranks)
  File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/site-packages/metametamerge/Profile.py", line 15, in __init__
    self.profilerank[rankid] = ProfileRank(profile[np.ix_(profile[:,1]==rankid, [0,2,3])],rankid,sum(profile[:,1]==rankid))
IndexError: too many indices for array

The dbprofile.out files all appear to be empty. Is that normal? I wonder if I broke something with the sample data YAML edits for dbdir?

[pabransford@hii sampledata]$ pwd
/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata

[pabransford@hii sampledata]$ cat sample_data_archaea_bacteria_HII-CUSTOM-DBDIR.yaml
# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"

samples:
  sample_data_archaea_bacteria:
     fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.1.fq.gz"
     fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.2.fq.gz"

gzipped: 1
threads: 1

[pabransford@hii sampledata]$ cat sample_data_custom_viral_HII-CUSTOM-DBDIR.yaml
# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"

samples:
  sample_data_custom_viral:
     fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_custom_viral.1.fq.gz"
     fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_custom_viral.2.fq.gz"

gzipped: 1
threads: 1

databases:
  - custom_viral_db

custom_viral_db:
    clark: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/clark_dudes/"
    dudes: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/clark_dudes/"
    kaiju: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/kaiju/"
    kraken: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/kraken/"

Note: I'm tired of redacting paths, and I'm certainly not hiding anything of value to do so. The above is all as-is :P

pirovc commented 6 years ago

The log files on the database directory are only going to be written on the creation of the database. Every other run using an existing database will only write log files on the working directory. (If that's still not enough, there's no way of changing that in the current version, but it should be easy to make a workaround if really necessary.)

Even though there are seveal logs and extense error messeges (what is kind of scary) metameta is running almost completely, besides one single step of generating those database profiles (the files should not be empty). Those profiles are generated automatically based on the web services provided by NCBI, what can be a bit annoying depending on your connection. Can you please try to run the following command to check if such script is working on your system:

/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/opt/metameta/scripts/acc2tab.bash NZ_FRFD01000003.1

you should get something like this:

NZ_FRFD01000003.1 1427898 Anaerocolumna xylanovorans DSM 12503 1121345 Bacteria|Firmicutes|Clostridia|Clostridiales|Lachnospiraceae|Anaerocolumna|Anaerocolumna xylanovorans2|1239|186801|186802|186803|1843210|100134

If the command works, it's hard to say why it didn't run in the first place. Maybe the servers were offline. If you delete the /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/*.dbprofile.out files and run metameta again they will be generated once more.

That was a problem for other users as well so in the next implementation there will be a offline option to create such profiles to not be dependent on NCBI web services.

draeath commented 6 years ago

Hello,

I tried it a few times in a row and I'm consistently getting this result:

/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/acc2tab.bash: line 78: -1: substring expression < 0

I added set -x and I see that it gets an XML file back. It's rather large, so I've uploaded it here for you: http://assets.draeath.net/misc/acc2tab.out.gz

Looks like either a syntax problem (we've got GNU bash 4.1.2 where the script's shebang points) or the formatting coming back from efetch.fcgi has changed on you?

pirovc commented 6 years ago

Thanks for the file and help to find the problem. It's problably a bash compability problem. I found a solution and updated the script. Can you please replace your acc2tab.bash with this https://raw.githubusercontent.com/pirovc/metameta/v1.2.1/scripts/acc2tab.bash and try again? I hope it solves the issue.

draeath commented 6 years ago

No problem at all, thank you for helping me get it operating!

Looks good:

[pabransford@hii scripts]$ ./acc2tab.bash NZ_FRFD01000003.1
NZ_FRFD01000003.1   1427898 Anaerocolumna xylanovorans DSM 12503    1121345 Bacteria|Firmicutes|Clostridia|Clostridiales|Lachnospiraceae|Anaerocolumna|Anaerocolumna xylanovorans   2|1239|186801|186802|186803|1843210|100134

I'm going to revert to the pre-execution state, apply that change, and re-run the process. Cross my fingers, we may be good now!

draeath commented 6 years ago

We're good! Both samples finished! I think we can close this issue, but as you had to make that (minor) code change, leaving that at your discretion.

Once again, thank you for the support!

pirovc commented 6 years ago

Glad to hear that, the fixed script will be available in the next release. Feel free to open a new issue in case of any further problems.