pirovc / metameta

Other
23 stars 10 forks source link

Building custom database issue #18

Open jongin333 opened 5 years ago

jongin333 commented 5 years ago

Hi,

I'd like to build a custom database for some of NCBI refseq sequences. Before that, I attempted to build a very small database, but It was failed.

Here is the directory structure for building the database, the configure file and log file.

======= directory structure ======= ./db ./db/clark ./db/clark/genomes.fna ./db/dudes ./db/dudes/genomes.fna ./db/kaiju ./db/kaiju/genome.gbff ./db/kraken ./db/kraken/genome.fna

======= config file ======= workdir: "/mss2/projects/META2/taxonomy_classification/metameta"

databases:

custom_db: clark: "/mss2/projects/META2/taxonomy_classification/metameta/db/clark" dudes: "/mss2/projects/META2/taxonomy_classification/metameta/db/dudes" kaiju: "/mss2/projects/META2/taxonomy_classification/metameta/db/kaiju" kraken: "/mss2/projects/META2/taxonomy_classification/metameta/db/kraken"

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"

samples: "TEST": fq1: "test1_1.fq.gz" fq2: "test1_2.fq.gz"

gzipped: 1 threads: 50

======= Log file ======= Building DAG of jobs... Provided cores: 5 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 clark_db_custom_1 1 clark_db_custom_2 1 clark_db_custom_3 1 clark_db_custom_4 1 clark_db_custom_check 1 clark_db_custom_profile 1 clark_rpt 1 clark_run_1 4 clean_files 1 clean_reads 4 database_profile 1 dudes_db_custom_1 1 dudes_db_custom_2 1 dudes_db_custom_3 1 dudes_db_custom_check 1 dudes_db_custom_profile 1 dudes_rpt 1 dudes_run_1 1 dudes_run_2 1 errorcorr_reads 1 get_accession2taxid 1 get_gi_taxid_nucl 1 get_taxdump 1 kaiju_db_custom_1 1 kaiju_db_custom_2 1 kaiju_db_custom_3 1 kaiju_db_custom_4 1 kaiju_db_custom_check 1 kaiju_db_custom_profile 1 kaiju_rpt 1 kaiju_run_1 1 kraken_db_custom_1 1 kraken_db_custom_2 1 kraken_db_custom_3 1 kraken_db_custom_check 1 kraken_db_custom_profile 1 kraken_rpt 1 kraken_run_1 1 krona 1 metametamerge 1 subsample_reads 1 trim_reads 49


MetaMeta Pipeline v1.2.0 by Vitor C. Piro (vitorpiro@gmail.com, http://github.com/pirovc)

Parameters:

rule kaiju_db_custom_1: output: dbcustom_db/kaiju_db/kaiju_db.faa log: dbcustom_db/log/kaiju_db_custom_1.log jobid: 48 benchmark: dbcustom_db/log/kaiju_db_custom_1.time wildcards: database=custom_db

rule get_gi_taxid_nucl: output: dbtaxonomy/gi_taxid_nucl.dmp.gz log: dbtaxonomy/log/get_gi_taxid_nucl.log jobid: 44 benchmark: dbtaxonomy/log/get_gi_taxid_nucl.time

rule get_taxdump: output: dbtaxonomy/taxdump.tar.gz, dbtaxonomy/names.dmp, dbtaxonomy/nodes.dmp, dbtaxonomy/merged.dmp log: dbtaxonomy/log/get_taxdump.log jobid: 4 benchmark: dbtaxonomy/log/get_taxdump.time

rule get_accession2taxid: output: dbtaxonomy/nucl_gb.accession2taxid.gz, dbtaxonomy/nucl_wgs.accession2taxid.gz log: dbtaxonomy/log/get_accession2taxid.log jobid: 47 benchmark: dbtaxonomy/log/get_accession2taxid.time

Activating conda environment /mss2/projects/META2/taxonomy_classification/metameta/.snakemake/conda/0e3e8e78.

rule clark_db_custom_1: output: dbcustom_db/clark_db/Custom/ log: dbcustom_db/log/clark_db_custom_1.log jobid: 40 benchmark: dbcustom_db/log/clark_db_custom_1.time wildcards: database=custom_db

Finished job 40. 1 of 49 steps (2%) done

rule kaiju_db_custom_profile: output: dbcustom_db/kaiju.dbaccession.out log: dbcustom_db/log/kaiju_db_custom_profile.log jobid: 36 benchmark: dbcustom_db/log/kaiju_db_custom_profile.time wildcards: database=custom_db

Finished job 36. 2 of 49 steps (4%) done Finished job 48. 3 of 49 steps (6%) done Exiting because a job execution failed. Look above for error message Will exit after finishing currently running jobs. Finished job 44. 4 of 49 steps (8%) done Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message


An error has occured. Please check the main log file for more information: /mss2/projects/META2/taxonomy_classification/metameta/metameta_2019-08-15_23-59-47.log Detailed output and execution time for each rule can be found at: /mss2/projects/META2/taxonomy_classification/metameta/db/log/ /mss2/projects/META2/taxonomy_classification/metameta/SAMPLE_NAME/log/

=======

How can I build a custom database? What did I miss?

Thank you, Jongin

pirovc commented 5 years ago

Hi Jongin,

Apparently jobs 4 (get_taxdump) and 47 (get_accession2taxid) could not be finished. Both of them are just downloading data from NCBI servers. Do you have some internet restrictions? you can check what was the error for those rules in:

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_taxdump.log

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_accession2taxid.log

Best Vitor

jongin333 commented 5 years ago

Hi Vitor,

I found the problem in the 'dbdir' path in the configure file. So, I changed and reran (wipe all and reran in a newly made directory).

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"
->
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db/"

Then, the previous errors were passed, but the other problem occurred in dudes. Here is the bottom of the log file.

Error in rule dudes_db_custom_profile:
    jobid: 47
    output: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
    log: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

RuleException:
CalledProcessError in line 56 of /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm:
Command ' set -euo pipefail;  python3 -c 'import numpy as np; npzfile = np.load("/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes_db/dudes_db.npz"); print("\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out 2> /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log ' returned non-zero exit status 1.
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm", line 56, in __rule_dudes_db_custom_profile
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job dudes_db_custom_profile since they might be corrupted:
/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
Will exit after finishing currently running jobs.
Touching output file /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/clark_db/.taxondata.
Finished job 34.
13 of 49 steps (27%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

How can I fix it?

Thank you, Jongin

pirovc commented 5 years ago

Dear Jongin,

Can you please send me the contetnts of the file: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

Vitor

jongin333 commented 5 years ago

Hi, Victor.

Unfortunately, the log file (dudes_db_custom_profile.log) is empty. I attach the final log file and the list of log directory. I hope that those are helpful.

-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 clark_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 clark_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  101 Aug 17 12:26 clark_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  120 Aug 17 12:26 clark_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:07 dudes_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 11:07 dudes_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 1392 Aug 17 12:19 dudes_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  134 Aug 17 12:19 dudes_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 12:19 dudes_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  118 Aug 17 11:05 kaiju_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  476 Aug 17 12:19 kaiju_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 12:19 kaiju_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo  976 Aug 17 12:19 kaiju_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 12:19 kaiju_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo   10 Aug 17 11:07 kaiju_db_custom_4.log
-rw-r--r-- 1 jongin bioinfo  116 Aug 17 11:07 kaiju_db_custom_4.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 kaiju_db_custom_profile.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:32 kraken_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  126 Aug 17 11:32 kraken_db_custom_2.time

metameta_2019-08-17_12-26-40.log

Thank you, Jongin

pirovc commented 5 years ago

Hi Jongin,

Can you check what is the python version in your environment? I guess the error can be related to this: https://github.com/pirovc/dudes/issues/2 if you have python > 3.5

In this case, you can try to change the following line in your metameta installation (the file should be at /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm based on your log file

https://github.com/pirovc/metameta/blob/3a3c203e6398caf040e3f5ab901469a8656cae3a/tools/dudes_db_custom.sm#L56

for this one:

shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}", allow_pickle=True); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""

but I guess the error will also occur when running dudes and you should change dudes code as well.

You can also try to install python 3.5 in the metametaenv and everything should be fine.

Cheers Vitor