Closed BenSamy2020 closed 2 years ago
Please find the detail about how to use PGA in docker at https://github.com/wenbostar/PGA/wiki/Use-PGA-docker.
Greetings @wenbostar
Thank you @wenbostar. On a side note, I would also like to thank you for make other amazing tools like PDV.
Regards, Ben
Greetings @wenbostar,
I have successfully run and loaded the PGA library. But I am not able to open my trinity_output.fasta file. I tested the file location using rstudio server installed on my WSL2 (https://support.rstudio.com/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2#rstudio-server-setup) and the file location is detectable and can be opened.
Would you be able to advise me if I have ran the rscript incorrectly? (I have provided my complete rscript below for your reference.
Regards, Ben
R version 3.6.1 (2019-07-05) -- "Action of the Toes" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help Bioconductor version '3.10' is out-of-date; the current release version '3.14' is available with R version '4.1'; see https://bioconductor.org/install
library("PGA") novoRNASeq(infa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "/mnt/f/Trinity/MDAMB231Loading required package: IRanges /Trinity_Files/MDAMB231_Novel_Transcripts_ntx.tab", outfa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_denovo_Proteogeomics_Database.fasLoading required package: BiocGenerics ta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: GenomicRanges Loading required package: GenomeInfoDb Loading required package: Biostrings Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
Loading required package: data.table data.table 1.12.8 using 12 threads (see ?getDTthreads). Latest news: r-datatable.com
Attaching package: ‘data.table’
The following object is masked from ‘package:GenomicRanges’:
shift
The following object is masked from ‘package:IRanges’:
shift
The following objects are masked from ‘package:S4Vectors’:
first, second
Loading required package: rTANDEM Loading required package: XML Loading required package: Rcpp
createProDB4DenovoRNASeq(infa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE,
- outmtab = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Novel_Transcripts_ntx.tab",
- outfa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE,
- make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo") Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : cannot open file '/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta'
You may try this:
##
## The following setting for -v
docker run -it -v /mnt/f/Trinity/MDAMB231/Trinity_Files/:/opt/ -u $(id -u):$(id -g) proteomics/pga
R
It's important to set "-v" correctly: "docker run -it -v ...". The -v parameter mounts the current system directory (/mnt/f/Trinity/MDAMB231/Trinity_Files/) into the container directory (/opt/).
In R:
createProDB4DenovoRNASeq(infa = "/opt/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE,
outmtab = "/opt/MDAMB231_Novel_Transcripts_ntx.tab",
outfa = "/opt/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE,
make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")
After you enter R, you could find all the files from folder "/mnt/f/Trinity/MDAMB231/Trinity_Files/" in /opt in docker.
Greeting @wenbostar,
Thank you. After mounting my system directory, I was able to execute the command successfully. For convenience, I have placed the full command for newbie users like me to successfully execute PGA in a docker environment.
Regards, Parthiban
docker run -it -v /mnt/f/Trinity/MDAMB231/Trinity_Files/:/opt/ -u $(id -u):$(id -g) proteomics/pga R library("PGA") createProDB4DenovoRNASeq(infa = "/opt/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "/opt/MDAMB231_Novel_Transcripts_ntx.tab", outfa = "/opt/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")
Greetings @wenbostar,
I would like to utilize PGA's createProDB4DenovoRNASeq function to generate a custom protein database for proteomics data searching. Unfortunately, due to absent dependencies in R, I am not able to run PGA. I am observing the error message of "Error: package ‘rTANDEM’ required by ‘PGA’ could not be found". Please do advise me on how I can proceed with my required task?
Specifically, I would like to utilize this command below:
"**# Library library("PGA")
Create custom protein database from trinity assembly output fasta file
createProDB4DenovoRNASeq(infa = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_Novel_Transcripts_ntx.tab", outfa = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo"**)"
Regards, Parthiban