wenbostar / PGA

PGA: a tool for ProteoGenomics Analysis
http://wenbostar.github.io/PGA/
7 stars 10 forks source link

Using PGA to create custom protein database from trinity.fasta file #26

Closed BenSamy2020 closed 2 years ago

BenSamy2020 commented 2 years ago

Greetings @wenbostar,

I would like to utilize PGA's createProDB4DenovoRNASeq function to generate a custom protein database for proteomics data searching. Unfortunately, due to absent dependencies in R, I am not able to run PGA. I am observing the error message of "Error: package ‘rTANDEM’ required by ‘PGA’ could not be found". Please do advise me on how I can proceed with my required task?

Specifically, I would like to utilize this command below:

"**# Library library("PGA")

Create custom protein database from trinity assembly output fasta file

createProDB4DenovoRNASeq(infa = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_Novel_Transcripts_ntx.tab", outfa = "F:\Trinity\MDAMB231\Trinity_Files\MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo"**)"

Regards, Parthiban

wenbostar commented 2 years ago

Please find the detail about how to use PGA in docker at https://github.com/wenbostar/PGA/wiki/Use-PGA-docker.

BenSamy2020 commented 2 years ago

Greetings @wenbostar

Thank you @wenbostar. On a side note, I would also like to thank you for make other amazing tools like PDV.

Regards, Ben

BenSamy2020 commented 2 years ago

Greetings @wenbostar,

I have successfully run and loaded the PGA library. But I am not able to open my trinity_output.fasta file. I tested the file location using rstudio server installed on my WSL2 (https://support.rstudio.com/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2#rstudio-server-setup) and the file location is detectable and can be opened.

Would you be able to advise me if I have ran the rscript incorrectly? (I have provided my complete rscript below for your reference.

Regards, Ben

R version 3.6.1 (2019-07-05) -- "Action of the Toes" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help Bioconductor version '3.10' is out-of-date; the current release version '3.14' is available with R version '4.1'; see https://bioconductor.org/install

library("PGA") novoRNASeq(infa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "/mnt/f/Trinity/MDAMB231Loading required package: IRanges /Trinity_Files/MDAMB231_Novel_Transcripts_ntx.tab", outfa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_denovo_Proteogeomics_Database.fasLoading required package: BiocGenerics ta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

expand.grid

Loading required package: GenomicRanges Loading required package: GenomeInfoDb Loading required package: Biostrings Loading required package: XVector

Attaching package: ‘Biostrings’

The following object is masked from ‘package:base’:

strsplit

Loading required package: data.table data.table 1.12.8 using 12 threads (see ?getDTthreads). Latest news: r-datatable.com

Attaching package: ‘data.table’

The following object is masked from ‘package:GenomicRanges’:

shift

The following object is masked from ‘package:IRanges’:

shift

The following objects are masked from ‘package:S4Vectors’:

first, second

Loading required package: rTANDEM Loading required package: XML Loading required package: Rcpp

createProDB4DenovoRNASeq(infa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE,

  • outmtab = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Novel_Transcripts_ntx.tab",
  • outfa = "/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE,
  • make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo") Error in .Call2("new_input_filexp", filepath, PACKAGE = "XVector") : cannot open file '/mnt/f/Trinity/MDAMB231/Trinity_Files/MDAMB231_Trinity_Processed_Sorted.fasta'
wenbostar commented 2 years ago

You may try this:


## 
## The following setting for -v 
docker run -it -v /mnt/f/Trinity/MDAMB231/Trinity_Files/:/opt/ -u $(id -u):$(id -g) proteomics/pga

R

It's important to set "-v" correctly: "docker run -it -v ...". The -v parameter mounts the current system directory (/mnt/f/Trinity/MDAMB231/Trinity_Files/) into the container directory (/opt/).

In R:

createProDB4DenovoRNASeq(infa = "/opt/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE,
    outmtab = "/opt/MDAMB231_Novel_Transcripts_ntx.tab",
    outfa = "/opt/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE,
    make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")

After you enter R, you could find all the files from folder "/mnt/f/Trinity/MDAMB231/Trinity_Files/" in /opt in docker.

BenSamy2020 commented 2 years ago

Greeting @wenbostar,

Thank you. After mounting my system directory, I was able to execute the command successfully. For convenience, I have placed the full command for newbie users like me to successfully execute PGA in a docker environment.

Regards, Parthiban

Using Docker with R to run PGA

docker run -it -v /mnt/f/Trinity/MDAMB231/Trinity_Files/:/opt/ -u $(id -u):$(id -g) proteomics/pga R library("PGA") createProDB4DenovoRNASeq(infa = "/opt/MDAMB231_Trinity_Processed_Sorted.fasta", bool_use_3frame = FALSE, outmtab = "/opt/MDAMB231_Novel_Transcripts_ntx.tab", outfa = "/opt/MDAMB231_denovo_Proteogeomics_Database.fasta", bool_get_longest = FALSE, make_decoy = TRUE, decoy_tag = "#REV#", outfile_name = "ProGeo")