jmonlong / PopSV

Population-based detection of structural variation from High-Throughput Sequencing.
http://jmonlong.github.io/PopSV/
29 stars 5 forks source link

could not find function "autoGCcounts" #8

Open ghost opened 6 years ago

ghost commented 6 years ago

Hi PopSV developers,

I am receiving the error message:

could not find function "autoGCcounts"

although, I already loaded

library (PopSV)

Any help appreciated,

Waqas.

jmonlong commented 6 years ago

Hi, The pre-build pipeline functions are in the automatedPipeline-batchtools.R script. You have to run source("automatedPipeline-batchtools.R") to read this file and load the autoGCcounts and autoNormTest functions. The automatedPipeline-batchtools.R script and other files for configuring your HPC are in the scripts folder. Let me know how it goes, Jean

ghost commented 6 years ago

Thanks for your response, So I am using the automated one right now, but still struggling!!!!!

source ("run-PopSV-batchjobs-automatedPipeline.R")
Error in cfReadBrewTemplate(template.file) : 
  could not find function "cfReadBrewTemplate"
Error: package or namespace load failed for ‘BatchJobs’:
 .onLoad failed in loadNamespace() for 'BatchJobs', details:
  call: sourceConfFile(cf)
  error: There was an error in sourcing your configuration file '/home/wuk/software/PopSV/scripts/.BatchJobs.R': Error in cfReadBrewTemplate(template.file) : 
  could not find function "cfReadBrewTemplate"

When I:

library (BatchJobs)

I also got the same error.

My BatchJobs.R file looks like this:

source("~/makeClusterFunctionsAdaptive.R")
cluster.functions <- makeClusterFunctionsAdaptive("~/guillimin.tmpl")
mail.start <- "none"
mail.done <- "none"
mail.error <- "none"
mail.from <- "<jean.monlong@mail.mcgill.ca>"
mail.to <- "<jean.monlong@mail.mcgill.ca>"

location of ~/makeClusterFunctionsAdaptive.R

Am I missing something, some paths etc,???

Waqas.

ghost commented 6 years ago

I forgot to mention that I am trying to run PopSV on a single server, not on HPC

jmonlong commented 6 years ago

I would recommend that you try using the batchtools version (the configuration is easier). If you want to run this on a single server you can use the configuration file batchtools.conf.local.R (you can choose the number of cores to use by changing the ncpus= argument). To use it rename it to batchtools.conf.R and place it in the working directory.

To test if batchtools is configured properly you can try to run the commands in the test script. If it works you can move to running the pipeline.

BTW, is there a reason why you want to run this on a single server versus a HPC ?

ghost commented 6 years ago

I did as you said, and checked with the test-batchtools.R. It worked fine.

> library (batchtools)
Loading required package: data.table
data.table 1.10.4.3
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
Breaking change introduced in batchtools v0.9.6: The format of the returned data.table of the functions `reduceResultsDataTable()`, getJobTable()`, `getJobPars()`, and `getJobResources()` has changed. List columns are not unnested automatically anymore. To manually unnest tables, batchtools provides the helper function `unwrap()` now, e.g. `unwrap(getJobPars())`. The previously introduced helper function `flatten()` will be deprecated due to a name clash with `purrr::flatten()`.
> library(PopSV)
> source ("test-batchtools.R")
Sourcing configuration file '/home/wuk/software/PopSV/scripts/batchtools/batchtools.conf.R' ...
Created registry in '/home/wuk/software/PopSV/scripts/batchtools/test' using cluster functions 'Multicore'
Adding 2 jobs ...
Submitting 2 jobs in 2 chunks using cluster functions 'Multicore' ...
>

With the actual data, it quite worked well but ended with an error:

wuk@wuk-Precision-Tower-7810:~/software/PopSV/scripts/batchtools$ R

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library (batchtools)
Loading required package: data.table
data.table 1.10.4.3
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com
Breaking change introduced in batchtools v0.9.6: The format of the returned data.table of the functions `reduceResultsDataTable()`, getJobTable()`, `getJobPars()`, and `getJobResources()` has changed. List columns are not unnested automatically anymore. To manually unnest tables, batchtools provides the helper function `unwrap()` now, e.g. `unwrap(getJobPars())`. The previously introduced helper function `flatten()` will be deprecated due to a name clash with `purrr::flatten()`.
> library(PopSV)
> source("automatedPipeline-batchtools.R")
Functions :
- 'autoGCcounts' to count BC in each sample.
- 'autoNormTest' to normalize and test all the samples.
- 'autoExtra' for some other functions.

> bam.files = read.table("bams.tsv", as.is=TRUE, header=TRUE)
> files.df = init.filenames(bam.files, code="example")
> save(files.df, file="files.RData")
> bin.size = 1e3
> bins.df = fragment.genome.hg19(bin.size)

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:data.table’:

    first, second

The following object is masked from ‘package:base’:

    expand.grid

Attaching package: ‘IRanges’

The following object is masked from ‘package:data.table’:

    shift

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

    strsplit

> save(bins.df, file="bins.RData")
> res.GCcounts = autoGCcounts("files.RData", "bins.RData")

== 1) Get GC content in each bin.

Sourcing configuration file '/home/wuk/software/PopSV/scripts/batchtools/batchtools.conf.R' ...
Created registry in '/home/wuk/software/PopSV/scripts/batchtools/getGC' using cluster functions 'Multicore'
Adding 1 jobs ...
Submitting 1 jobs in 1 chunks using cluster functions 'Multicore' ...

== 2) Get bin counts in each sample and correct for GC bias.

Sourcing configuration file '/home/wuk/software/PopSV/scripts/batchtools/batchtools.conf.R' ...
Created registry in '/home/wuk/software/PopSV/scripts/batchtools/getBC' using cluster functions 'Multicore'
Adding 3 jobs ...
Submitting 3 jobs in 3 chunks using cluster functions 'Multicore' ...
Waiting (Q:0 R:1 D:0 E:2 ?:0) [=====================-----------]  67% eta:  1h
Waiting (Q:0 R:1 D:0 E:2 ?:0) [=====================-----------]  67% eta:  2h                    
Status for 3 jobs:                                                            
  Submitted    : 3 (100.0%)
  -- Queued    : 0 (  0.0%)
  -- Started   : 3 (100.0%)
  ---- Running : 0 (  0.0%)
  ---- Done    : 0 (  0.0%)
  ---- Error   : 3 (100.0%)
  ---- Expired : 0 (  0.0%)
Mean run time: 1.19 hours.
Error in autoGCcounts("files.RData", "bins.RData") : 
  Not done yet or failed, see for yourself
> cnvs.df = autoNormTest("files.RData", "bins.RData")

== 1) Sample QC and reference definition.

Sourcing configuration file '/home/wuk/software/PopSV/scripts/batchtools/batchtools.conf.R' ...
Created registry in '/home/wuk/software/PopSV/scripts/batchtools/sampQC' using cluster functions 'Multicore'
Adding 1 jobs ...
Submitting 1 jobs in 1 chunks using cluster functions 'Multicore' ...
Status for 1 jobs:                                                            
  Submitted    : 1 (100.0%)
  -- Queued    : 0 (  0.0%)
  -- Started   : 1 (100.0%)
  ---- Running : 0 (  0.0%)
  ---- Done    : 0 (  0.0%)
  ---- Error   : 1 (100.0%)
  ---- Expired : 0 (  0.0%)
Mean run time: 0.0011 hours.
Error in autoNormTest("files.RData", "bins.RData") : 
  Not done yet or failed, see for yourself

It requires your attention again.

I want to tell you further that I have reference / samples study. Right now, I have only three samples. With three samples, how much number of references I need?

Thanks in advance,

Waqas.

ghost commented 6 years ago

if you miss the thread, any help is appreciated!!!

Waqas.

jmonlong commented 6 years ago

Thanks for your patience. There seems to be errors in the second step of the autoGCcounts function. I updated the pipeline functions to show a log of the errors when using the argument status=TRUE. Can you rerun the following to have more information about the errors:

## Download the new version of automatedPipeline-batchtools.R
source("automatedPipeline-batchtools.R")
res.GCcounts = autoGCcounts("files.RData", "bins.RData", status=TRUE)

In term of references, we recommend to have at least 20 reference samples (40-50 would be better though). PopSV is not suited to analyze 3 samples only. Do you have controls that were sequenced similarly and that you could use as reference ?

ghost commented 6 years ago

Thanks, previous erorr was resolved, just facing the error on last command:

> cnvs.df = autoNormTest("files.RData", "bins.RData")

== 1) Sample QC and reference definition.

Sourcing configuration file '/home/wuk/software/PopSV/scripts/batchtools/batchtools.conf.R' ...
Created registry in '/home/wuk/software/PopSV/scripts/batchtools/sampQC' using cluster functions 'Multicore'
Adding 1 jobs ...
Submitting 1 jobs in 1 chunks using cluster functions 'Multicore' ...
Status for 1 jobs:                                                            
  Submitted    : 1 (100.0%)
  -- Queued    : 0 (  0.0%)
  -- Started   : 1 (100.0%)
  ---- Running : 0 (  0.0%)
  ---- Done    : 0 (  0.0%)
  ---- Error   : 1 (100.0%)
  ---- Expired : 0 (  0.0%)
Mean run time: 0.000882 hours.
Error in autoNormTest("files.RData", "bins.RData") : 
  Not done yet or failed, see for yourself

Thanks for support till here,

Waqas.