Closed draeath closed 6 years ago
You can solve that by running a dummy run with the sample data and a pre-configured database.
The database directory is defined with the dbdir variable on the configuration file. The database files will be downloaded in the defined directory. In further executions, if the same path in dbdir and same database were selected in the configuration file, metameta will figure that out and just run the tools.
Best Vitor
Thank you!
I think I've got it sorted. It's going to take some time, obviously.
I did catch an error in the snakemake output fairly early. Is this indicative of something being wrong, or is this OK to occur with the sample data?
rule motus_run_1:
input: sample_data_archaea_bacteria/reads/motus.1.fq, /shares/hii/bioinfo/data/ref/MetaMeta/archaea_bacteria_201503/motus_db_check.done
output: sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz
log: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log
jobid: 18
benchmark: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.time
wildcards: sample=sample_data_archaea_bacteria, database=archaea_bacteria_201503
Error in rule motus_run_1:
jobid: 18
output: sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz
log: sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log
RuleException:
CalledProcessError in line 12 of /shares/hii/sw/MetaMeta/1.2.0/opt/metameta/tools/motus.sm:
Command ' set -euo pipefail;
cd sample_data_archaea_bacteria/motus/archaea_bacteria_201503/
mOTUs.pl --processors=1 ../../../sample_data_archaea_bacteria/reads/motus.1.fq ../../../sample_data_archaea_bacteria/reads/motus.2.fq --output-directory ./ > ../../../sample_
data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log 2>&1
mv NCBI.species.abundances.gz ../../../sample_data_archaea_bacteria/motus/archaea_bacteria_201503/sample_data_archaea_bacteria.species.abundances.gz ' returned non-zero exit
status 2.
File "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/tools/motus.sm", line 12, in __rule_motus_run_1
File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Finished job 5.
7 of 45 steps (16%) done
The contents of sample_data_archaea_bacteria/log/archaea_bacteria_201503/motus_run_1.log:
ReadTrimFilter_aux : SAMPLE motus.processing.1
ReadTrimFilter_aux : SCRIPT VERSION 3
ReadTrimFilter_aux : CLEANUP : rm -f ./motus.processing.1/temp/*.trimmed.filtered.fq*; mkdir -p /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/stats; rm -fr /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa; mkdir -p /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa
ReadTrimFilter_aux : GET QUALITY STATS : mode=solexaqa sanger=-Q 33 format=sanger sample=lane1.1.fq : EXECUTE cat /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.1.fq | /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastx_quality_stats -o ./motus.processing.1/temp/lane1.1.fq.qual_stats.temp -Q 33
ReadTrimFilter_aux : Get 3 prime end
ReadTrimFilter_aux : GET QUALITY STATS : mode=solexaqa sanger=-Q 33 format=sanger sample=lane1.2.fq : EXECUTE cat /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.2.fq | /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastx_quality_stats -o ./motus.processing.1/temp/lane1.2.fq.qual_stats.temp -Q 33
ReadTrimFilter_aux : Get 3 prime end
ReadTrimFilter_aux : PROCESS PAIR-END lane1 : EXECUTE /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/fastq_trim_filter_v5_EMBL -m solexaqa -a /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.1.fq -b /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/lane1.2.fq -f 2 -2 1 -q 20 -l 45 -p 50 -Q 33 -o /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/reads.processed.solexaqa/lane1
ReadTrimFilter_aux : SUMMARIZE STATS and write /tmp/metameta-workdir/sample_data_archaea_bacteria/motus/archaea_bacteria_201503/motus.processing.1/stats/motus.processing.1.readtrimfilter.solexaqa.stats
ReadTrimFilter_aux : COMPLETED READ TRIM FILTER
ParseFASTToPacked() : Cannot open annotationFileName!
Parsing FASTA file..
Command '/shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/bin/2bwt-builder /shares/hii/sw/MetaMeta/1.2.0/opt/motus/motus_data/data/mOTU.v1.padded' failed: No such file or directory at /shares/hii/sw/MetaMeta/1.2.0/bin/mOTUs.pl line 83, <LANE> line 8.
Note: i'm a sysadmin, and not one of the researchers. I'm afraid much of this is unfamiliar to me. I can't make sense of that last line of output:
No such file or directory at /shares/hii/sw/MetaMeta/1.2.0/bin/mOTUs.pl line 83, <LANE> line 8.
<LANE>
doesn't seem to be a reference to a file, and mOTUs.pl around there looks pretty sane (though I don't read perl):
sub system_ {
my $cmd = shift;
if ($verbose) {
print "Will execute \n\n$cmd\n\n";
}
(system($cmd) == 0) or die("Command '$cmd' failed: $!");
}
The config file is as follows:
# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"
samples:
sample_data_archaea_bacteria:
fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.1.fq.gz"
fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.2.fq.gz"
gzipped: 1
threads: 1
I'm waiting to see if this one finishes OK before running the viral one. Similar modifications were made (mostly using absolute paths)
Is this database file healthy?
It's only 21kb, and contains only:
$ tar tf ~/Downloads/motus_bac_arc_v1.tar.gz
motus_db/
motus.dbprofile.out
The motus.dbprofile.out
contains lines like this. All 0 at the end, if that's significant:
superkingdom Archaea 0
superkingdom Bacteria 0
phylum Acidobacteria 0
phylum Actinobacteria 0
phylum Aquificae 0
phylum Bacteroidetes 0
phylum Chlamydiae 0
Update: the referenced file from the zenodo.org entry description here is 47M. It's a perl script with some embedded data, so not sure what I can do with this.
The problem here is with the installation of mOTUs. Are you pre-installing it or using the snakemake integration with BioConda --use-conda
? The error msg says that mOTUs couldn't find its own database apparently.
Besides that the motu data from zenodo is fine. The database comes with the perl (and it is indeed very small) and it's unpacked when you install the motus tool. Those zeros are just placeholders since the mOTUs database is a bit different from the rest.
I installed these packages/versions in the same conda environment that I installed MetaMeta. Should that have worked? https://raw.githubusercontent.com/pirovc/metameta/master/envs/metameta_complete.yaml
What should motus look like inside conda, for me to know it's operable?
I'm wondering if it was a permissions issue with motus_data. I'd like to avoid having any contents of the conda environment be writable, if that's even possible here.
I see the following:
[pabransford@hii motus]$ tree
.
├── motus_data
│ ├── bin
│ │ ├── 2bwt-builder
│ │ ├── fastq_trim_filter_v5_EMBL
│ │ ├── fastq_trim_filter_v5_OSX -> fastq_trim_filter_v5_EMBL
│ │ ├── fastx_quality_stats
│ │ ├── msamtools
│ │ ├── samtools
│ │ └── soap2.21
│ ├── data
│ │ ├── mOTU-LG.v1.annotations.txt
│ │ ├── mOTU.v1.map.txt
│ │ ├── mOTU.v1.padded
│ │ ├── mOTU.v1.padded.coord
│ │ ├── mOTU.v1.padded.len
│ │ ├── mOTU.v1.padded.motu.linkage.map
│ │ ├── mOTU.v1.padded.motu.map
│ │ ├── RefMG.v1.padded
│ │ ├── RefMG.v1.padded.coord
│ │ ├── RefMG.v1.padded.len
│ │ ├── RefMG.v1.padded.refmg.map
│ │ └── RefMG.v1.padded.refmg.map.total_length
│ └── src
│ ├── MOCATCalculateTaxonomy.pl
│ ├── MOCATExternal_soap2sam.pl
│ ├── MOCATFilter_falen.pl
│ ├── MOCATFilter_filterPE.pl
│ ├── MOCATFilter_remove_in_padded.pl
│ ├── MOCATFilter_soap2sam.awk
│ ├── MOCATFilter_stats.pl
│ ├── MOCATFraction.pl
│ ├── MOCATPasteTaxonomyCoverageFiles_generate_mOTU_tables.pl
│ ├── MOCATReadTrimFilter_aux.pl
│ └── MOCATScreen_filter.pl
└── mOTUs.pl
But I'm not sure if that's complete. The motus_data/data directory is about 153M.
Looks like the installation is correct. Is this folder inside ~/miniconda3/opt/motus/ (or miniconda3/envs/env_name/opt/motus)? This is the folder mOTUs works, besides that there's a symbolic link to the mOTUs.pl script to bin/ . I'm not sure if the directory needs to be writable, but the contents on motus_data/bin/ should be executable at least.
It's outside of ${HOME}
and owned by root (we're installing this all into a GPFS share, for cluster nodes to execute from, for context).
The files in ./opt/motus/motus_data/bin/
are indeed mode 0755
I'm going to try moving this all aside and recreating the process, but leaving it all owned by my user through a test drive. If it works, we'll be able to figure out what's different between the two trees and maybe figure out a way to document this for anyone else following in my footsteps :)
If it doesn't work that way either, well, things get interesting.
I'll let you know which case it was.
Thank you very much for your support on this thus far. It is very much appreciated!
Huh, getting something entirely different now. Conda is having dependency issues. I started over from scratch, so a fresh copy of Miniconda3-latest-Linux-x86_64.sh
(version 4.3.31) and metameta_complete.yaml
I tried both just installing metameta, and with your environment yaml.
[pabransford@hii MetaMeta]$ pwd
/shares/hii/sw/MetaMeta
[pabransford@hii MetaMeta]$ echo $PATH
/shares/hii/sw/MetaMeta/1.2.0/bin:{REDACTED}
[pabransford@hii MetaMeta]$ conda install -c bioconda metameta=1.2.0
Fetching package metadata .............
Solving package specifications:
PackageNotFoundError: Packages missing in current channels:
- metameta 1.2.0* -> snakemake ==4.3.0 -> aioeasywebdav
- metameta 1.2.0* -> snakemake ==4.3.0 -> ratelimiter
We have searched for the packages in the following channels:
- https://conda.anaconda.org/bioconda/linux-64
- https://conda.anaconda.org/bioconda/noarch
- https://repo.continuum.io/pkgs/main/linux-64
- https://repo.continuum.io/pkgs/main/noarch
- https://repo.continuum.io/pkgs/free/linux-64
- https://repo.continuum.io/pkgs/free/noarch
- https://repo.continuum.io/pkgs/r/linux-64
- https://repo.continuum.io/pkgs/r/noarch
- https://repo.continuum.io/pkgs/pro/linux-64
- https://repo.continuum.io/pkgs/pro/noarch
[pabransford@hii MetaMeta]$ conda env create -f metameta_complete.yaml
Fetching package metadata .............
Solving package specifications:
ResolvePackageNotFound:
- kaiju 1.4.5*
- perl 5.22.0*
- metameta 1.2.0*
- snakemake ==4.3.0
- aioeasywebdav
- metameta 1.2.0*
- snakemake ==4.3.0
- ratelimiter
If I'm reading it right, it's mad about snakemake? I can see that version is showing up in a search:
[pabransford@hii MetaMeta]$ conda search -c bioconda snakemake
Fetching package metadata .............
snakemake 3.4.2 py34_1 bioconda
{ snipped out a bunch of unrelated versions }
4.2.0 py35_0 bioconda
4.2.0 py36_0 bioconda
4.3.0 py35_0 bioconda
4.3.0 py36_0 bioconda
4.3.1 py35_0 bioconda
4.3.1 py36_0 bioconda
Strange, I never saw such an error with motus, even installing it beforehand as you're doing. I'll try to replicate the scenario once again.
Regarding conda, they could have change some packages and the bioconda channel by itself is not enough anymore. Can you please add the secondary channels and try again? Execute the following commands in this order:
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
That will add those channels in your current conda installation list, so the install command should be conda install metameta=1.2.0
Looks like conda-forge
was the missing magic. Seems to be required now.
The following NEW packages will be INSTALLED:
aioeasywebdav: 2.2.0-py36_0 conda-forge
aiohttp: 2.0.7-py36_0 conda-forge
appdirs: 1.4.3-py_0 conda-forge
async-timeout: 1.2.1-py36_0 conda-forge
bcrypt: 3.1.4-py36h621fe67_0
configargparse: 0.12.0-py36_0 conda-forge
docutils: 0.14-py36_0 conda-forge
dropbox: 8.4.0-py_0 conda-forge
filechunkio: 1.6-py36_0 bioconda
ftputil: 3.2-py36_0 bioconda
intel-openmp: 2018.0.0-hc7b2577_8
libgfortran-ng: 7.2.0-h9f7466a_2
metameta: 1.2.0-1 bioconda
mkl: 2018.0.1-h19d6760_4
multidict: 2.1.4-py36_0 conda-forge
numpy: 1.14.0-py36h3dfced4_1
pandas: 0.22.0-py36_0 conda-forge
paramiko: 2.3.1-py_0 conda-forge
psutil: 5.4.0-py36_0 conda-forge
pyasn1: 0.4.2-py_0 conda-forge
pynacl: 1.1.2-py36_0 conda-forge
pysftp: 0.2.9-py36_0 bioconda
python-dateutil: 2.6.1-py36_0 conda-forge
pytz: 2017.3-py_2 conda-forge
pyyaml: 3.12-py36_1 conda-forge
ratelimiter: 1.2.0-py36_0 conda-forge
snakemake: 4.3.0-py36_0 bioconda
wrapt: 1.10.11-py36_0 conda-forge
yarl: 0.10.0-py36_0 conda-forge
The following packages will be UPDATED:
conda: 4.3.31-py36_0 --> 4.3.33-py36_0 conda-forge
The following packages will be SUPERSEDED by a higher-priority channel:
conda-env: 2.6.0-h36134e3_1 --> 2.6.0-0 conda-forge
I'll drop an update if I get a working installation or if I have motus problems again :)
I got further along, almost to the end. The node I was working on apparently didn't have enough memory for archaea_bacteria_201503/kraken_db/database.kdb
I'll continue working on this on one of our higher memory nodes, but meanwhile: is there any way to allow it to run in a smaller scale? What do you consider a minimum spec? It blew up with this on a 64gb machine (with ~ 55 free)
Loading database... classify: unable to mmap {snip}/archaea_bacteria_201503/kraken_db//database.kdb: Cannot allocate memory
I see the file on-disk is ~ 66gb.
Thanks for the heads-up about the channels, I will update the README accordingly. Did mouts work on the separated environment?
kraken is indeed memory hungry, It needs at least 66Gb since it loads the complete database to memory. If you are running on multiple threads (not in a cluster environment) it's even more, because other tools will be running at the same time, sharing memory.
Hello,
I haven't tried it with the write permissions stripped out yet, I'm trying to get a successful run first before even trying that.
I'm failing out further along now! But a quick question before getting in to that. I noticed log files being created in the dbdir. This is undesirable in my case. Is there any way to have that put somewhere else? Example:
MetaMeta finished successfuly
Please check the main log file for more information:
/tmp/metameta-workdir/metameta_2018-02-05_12-27-52.log
Detailed output and execution time for each rule can be found at:
/shares/hii/bioinfo/data/ref/MetaMeta/log/
/tmp/metameta-workdir/SAMPLE_NAME/log/
Now, for the bad news. The sample_data_archaea_bacteria
completed without (apparent) error, but sample_data_custom_viral
is failing at the metametamerge
step:
rule metametamerge:
input: sample_data_custom_viral/profiles/custom_viral_db/clark_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/dudes_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/kaiju_clean_files.done, sample_data_custom_viral/profiles/custom_viral_db/kraken_clean_files.done, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp
output: sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out
log: sample_data_custom_viral/log/custom_viral_db/metametamerge.log
jobid: 3
benchmark: sample_data_custom_viral/log/custom_viral_db/metametamerge.time
wildcards: sample=sample_data_custom_viral, database=custom_viral_db
Error in rule metametamerge:
jobid: 3
output: sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out
log: sample_data_custom_viral/log/custom_viral_db/metametamerge.log
RuleException:
CalledProcessError in line 22 of /shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/metametamerge.sm:
Command ' set -euo pipefail; MetaMetaMerge.py --input-files sample_data_custom_viral/profiles/custom_viral_db/clark.profile.out sample_data_custom_viral/profiles/custom_viral_db/dudes.profile.out sample_data_custom_viral/profiles/custom_viral_db/kaiju.profile.out sample_data_custom_viral/profiles/custom_viral_db/kraken.profile.out --database-profiles /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/dudes.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kaiju.dbprofile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kraken.dbprofile.out --tool-identifier 'clark,dudes,kaiju,kraken' --tool-method 'b,p,b,b' --names-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp --nodes-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp --merged-file /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp --bins 4 --cutoff 0.0001 --mode 'linear' --ranks 'species' --output-file sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out --output-parsed-profiles > sample_data_custom_viral/log/custom_viral_db/metametamerge.log 2>&1 ' returned non-zero exit status 1.
File "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/metametamerge.sm", line 22, in __rule_metametamerge
File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Here is the content of metameta-workdir/sample_data_custom_viral/log/custom_viral_db/metametamerge.log
:
- - - - - - - - - - - - - - - - - - - - -
MetaMetaMerge 1.1
- - - - - - - - - - - - - - - - - - - - -
Input files:
clark (b) sample_data_custom_viral/profiles/custom_viral_db/clark.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out
dudes (p) sample_data_custom_viral/profiles/custom_viral_db/dudes.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/dudes.dbprofile.out
kaiju (b) sample_data_custom_viral/profiles/custom_viral_db/kaiju.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kaiju.dbprofile.out
kraken (b) sample_data_custom_viral/profiles/custom_viral_db/kraken.profile.out /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/kraken.dbprofile.out
Taxonomy:
/shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/names.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/nodes.dmp, /shares/hii/bioinfo/data/ref/MetaMeta/taxonomy/merged.dmp
Bins: 4
Cutoff: 0.0001
Mode: linear
Ranks: species
Output file (type): sample_data_custom_viral/metametamerge/custom_viral_db/final.metametamerge.profile.out (bioboxes)
Verbose: False
Detailed: False
- - - - - - - - - - - - - - - - - - - - -
Parsing taxonomy (names, nodes, merged) ...
Reading database profiles ...
- /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/clark.dbprofile.out (tsv)
species - 0 entries (0 ignored)
(WARNING) no valid entries found [species]
Traceback (most recent call last):
File "/shares/hii/sw/MetaMeta/1.2.0/bin/MetaMetaMerge.py", line 338, in <module>
main()
File "/shares/hii/sw/MetaMeta/1.2.0/bin/MetaMetaMerge.py", line 113, in main
db = Databases(database_file, parse_files(database_file, 'db', all_names_scientific, all_names_other, nodes, merged, ranks, args.verbose), ranks)
File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/site-packages/metametamerge/Databases.py", line 8, in __init__
Profile.__init__(self, profile, ranks)
File "/shares/hii/sw/MetaMeta/1.2.0/lib/python3.6/site-packages/metametamerge/Profile.py", line 15, in __init__
self.profilerank[rankid] = ProfileRank(profile[np.ix_(profile[:,1]==rankid, [0,2,3])],rankid,sum(profile[:,1]==rankid))
IndexError: too many indices for array
The dbprofile.out files all appear to be empty. Is that normal? I wonder if I broke something with the sample data YAML edits for dbdir?
[pabransford@hii sampledata]$ pwd
/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata
[pabransford@hii sampledata]$ cat sample_data_archaea_bacteria_HII-CUSTOM-DBDIR.yaml
# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"
samples:
sample_data_archaea_bacteria:
fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.1.fq.gz"
fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_archaea_bacteria.2.fq.gz"
gzipped: 1
threads: 1
[pabransford@hii sampledata]$ cat sample_data_custom_viral_HII-CUSTOM-DBDIR.yaml
# Output and working directory (all other directories below - if not absolute - are relative to this one)
workdir: "/tmp/metameta-workdir/" # NOTICE: not appropriate for actual runs with this tool! This should be on GPFS.
dbdir: "/shares/hii/bioinfo/data/ref/MetaMeta/"
samples:
sample_data_custom_viral:
fq1: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_custom_viral.1.fq.gz"
fq2: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/reads/sample_data_custom_viral.2.fq.gz"
gzipped: 1
threads: 1
databases:
- custom_viral_db
custom_viral_db:
clark: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/clark_dudes/"
dudes: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/clark_dudes/"
kaiju: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/kaiju/"
kraken: "/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/sampledata/files/kraken/"
Note: I'm tired of redacting paths, and I'm certainly not hiding anything of value to do so. The above is all as-is :P
The log files on the database directory are only going to be written on the creation of the database. Every other run using an existing database will only write log files on the working directory. (If that's still not enough, there's no way of changing that in the current version, but it should be easy to make a workaround if really necessary.)
Even though there are seveal logs and extense error messeges (what is kind of scary) metameta is running almost completely, besides one single step of generating those database profiles (the files should not be empty). Those profiles are generated automatically based on the web services provided by NCBI, what can be a bit annoying depending on your connection. Can you please try to run the following command to check if such script is working on your system:
/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/opt/metameta/scripts/acc2tab.bash NZ_FRFD01000003.1
you should get something like this:
NZ_FRFD01000003.1 1427898 Anaerocolumna xylanovorans DSM 12503 1121345 Bacteria|Firmicutes|Clostridia|Clostridiales|Lachnospiraceae|Anaerocolumna|Anaerocolumna xylanovorans2|1239|186801|186802|186803|1843210|100134
If the command works, it's hard to say why it didn't run in the first place. Maybe the servers were offline. If you delete the /shares/hii/bioinfo/data/ref/MetaMeta/custom_viral_db/*.dbprofile.out
files and run metameta again they will be generated once more.
That was a problem for other users as well so in the next implementation there will be a offline option to create such profiles to not be dependent on NCBI web services.
Hello,
I tried it a few times in a row and I'm consistently getting this result:
/shares/hii/sw/MetaMeta/1.2.0/opt/metameta/scripts/acc2tab.bash: line 78: -1: substring expression < 0
I added set -x
and I see that it gets an XML file back. It's rather large, so I've uploaded it here for you: http://assets.draeath.net/misc/acc2tab.out.gz
Looks like either a syntax problem (we've got GNU bash 4.1.2 where the script's shebang points) or the formatting coming back from efetch.fcgi
has changed on you?
Thanks for the file and help to find the problem. It's problably a bash compability problem. I found a solution and updated the script. Can you please replace your acc2tab.bash
with this https://raw.githubusercontent.com/pirovc/metameta/v1.2.1/scripts/acc2tab.bash
and try again? I hope it solves the issue.
No problem at all, thank you for helping me get it operating!
Looks good:
[pabransford@hii scripts]$ ./acc2tab.bash NZ_FRFD01000003.1
NZ_FRFD01000003.1 1427898 Anaerocolumna xylanovorans DSM 12503 1121345 Bacteria|Firmicutes|Clostridia|Clostridiales|Lachnospiraceae|Anaerocolumna|Anaerocolumna xylanovorans 2|1239|186801|186802|186803|1843210|100134
I'm going to revert to the pre-execution state, apply that change, and re-run the process. Cross my fingers, we may be good now!
We're good! Both samples finished! I think we can close this issue, but as you had to make that (minor) code change, leaving that at your discretion.
Once again, thank you for the support!
Glad to hear that, the fixed script will be available in the next release. Feel free to open a new issue in case of any further problems.
I see in the README that metameta will download the databases on first run.
I am preparing the software in an environment where the users do not have privileges to write where it will be installed. I don't have what's needed to run the tool, so I need some means to kick off this download manually.
I'd also like to know where these get stashed away, because they seem quite large. Is one able to place them elsewhere, and inform metameta of their location?
I've attempted to puzzle out how this works in the source, but I can't make heads or tails of it.