hivlab / quantify-virome

quantify-virome: identify and quantify viruses from metagenomic shotgun sequences
MIT License
3 stars 1 forks source link

fix filter viruses #18

Closed tpall closed 5 years ago

tpall commented 5 years ago

Something has went sour in tidyverse environment

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       filter_viruses
        1

[Sat Dec  1 16:51:00 2018]
rule filter_viruses:
    input: blast/SRR5580142_blastn_virus_19_known-viral.tsv, /gpfs/software/VirusSeeker/
databases/taxdump_300817/vhunter.db, taxonomy/nodes.csv
    output: results/SRR5580142_phages_19.csv, blast/SRR5580142_candidate_viruses_19.csv
    jobid: 0
    wildcards: sample=SRR5580142, n=19

Activating conda environment: /gpfs/hpchome/taavi74/fastq/prjna361402/.snakemake/conda/1
30c7264
During startup - Warning message:
Setting LC_CTYPE failed, using "C" 
Parse blast results
Importing BLAST+ tabular output
Parsed with column specification:
cols(
  qseqid = col_character(),
  sgi = col_integer(),
  pident = col_double(),
  length = col_integer(),
  mismatch = col_integer(),
  gapopen = col_integer(),
  qstart = col_integer(),
  qend = col_integer(),
  sstart = col_integer(),
  send = col_integer(),
  evalue = col_double(),
  `bitscore'` = col_double()
)
Munging metadata
Map tax_ids to gis

Connect to database
Collect tax_ids from tables
Bind rows
Join taxonomy to blast results by gi
Joining, by = "gi"
Fill in few missing tax_ids by quering remote ncbi database
Error in mutate_impl(.data, dots) : 
  Evaluation error: no applicable method for 'xml_find_first' applied to an object of class "list".
Calls: do.call ... eval -> <Anonymous> -> mutate.tbl_df -> mutate_impl
Execution halted
[Sat Dec  1 16:51:17 2018]
Error in rule filter_viruses:
    jobid: 0
    output: results/SRR5580142_phages_19.csv, blast/SRR5580142_candidate_viruses_19.csv
    conda-env: /gpfs/hpchome/taavi74/fastq/prjna361402/.snakemake/conda/130c7264

RuleException:
CalledProcessError in line 99 of /gpfs/hpchome/taavi74/Projects/vs/rules/blast.smk:
Command 'source activate /gpfs/hpchome/taavi74/fastq/prjna361402/.snakemake/conda/130c7264; set -euo pipefail;  Rscript /gpfs/hpchome/taavi74/fastq/prjna361402/.snakemake/scripts/tmp86qu6xuw.filter_viruses.R ' returned non-zero exit status 1.
  File "/gpfs/hpchome/taavi74/Projects/vs/rules/blast.smk", line 99, in __rule_filter_viruses
  File "/gpfs/hpchome/taavi74/miniconda3/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
tpall commented 5 years ago

This line gives error:

dplyr::mutate(no_taxid, tax_id = purrr::map_chr(gi, query_taxid))

Vectorise? because esearch takes in vector of gi-s.

tpall commented 5 years ago

Fixed by 6deb18d16faddb11478507ec84db2547aa9f21aa