transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Error with R script: DESeq_stats. #70

Open pauvic opened 2 years ago

pauvic commented 2 years ago

Hello, I had rthis error when I try to run DESeq_stats.

/run_DESeq_stats.R: line 4: syntax error near unexpected token {' ./run_DESeq_stats.R: line 4:suppressPackageStartupMessages({'

I could not figure out what went wrong . Would you guide me through this?

Thanks! Paula

transcript commented 2 years ago

Hey Paula, that's pretty much the first line in the R script. Are you hitting this as part of the master_script.sh shell script, or are you trying to run the run_DESeq_stats.R script on its own?

Can you check your R version on your machine?

If you have RStudio, you could also open this in that program and run it line by line interactively.

Best, Sam

pauvic commented 2 years ago

Hello, thanks for your answer.
I am trying to run the run_DESeq_stats.R script on its own.
R version in may machine is 4.1.2 (2021-11-01) . Do you think there is a problem with my R version? Best, Paula

transcript commented 2 years ago

Thanks Paula! I just checked with R version 4.1.2 (latest), and didn't run into any issues.

Next question: can you verify that the optparse and DESeq2 packages are installed on your machine? You can check this fairly easily in command-line R by running:

R
library("optparse")
library("DESeq2")

If either of these gives you a warning that it's not installed, you can install them easily in R:

Optparse:

install.packages("optparse")

DESeq2:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")

Can you confirm both these packages are installed?

pauvic commented 2 years ago

Sam, thanks for your help. I already checked that both packages were installed but unfortunately I still have the same error. What else could I do? thanks again for your time

transcript commented 2 years ago

Hi Paula, interesting, this is a tricky problem! A few more things to test, and to help me:

  1. What operating system are you using? Mac, Windows, Linux?

  2. In command line R (which you can start by just typing "R" on the terminal), could you try pasting in the following lines:

suppressPackageStartupMessages({
  library(optparse)
})

option_list = list(
  make_option(c("-I", "--input"), type="character", default="./",
              help="Input directory", metavar="character"),
  make_option(c("-O", "--out"), type="character", default="DESeq_results.tab", 
              help="output file name [default= %default]", metavar="character"),
  make_option(c("-R", "--raw_counts"), type="character", default=NULL,
              help="raw (total) read counts for this starting file", metavar="character")
)

All of this is just the first lines of the script, and I can run it (R version 4.1.2, on a Mac) copy-pasting it into command-line R without any issues or warnings.

Let me know if this works successfully or still gives you the same error.

transcript commented 2 years ago

Additionally, one thing more to check: can you copy/paste the error in code blocks, AKA between back-ticks like ` ?

It looks like there might be an extra mark in the message you pasted in, which could be responsible for this, but I can't tell if it's just Github trying to format the pasted content.

pauvic commented 2 years ago

Sam, thanks for your answer and sorry for the late reply!! 1- I am using window. 2- If I run the first line of the script in R, I don't have any issues or warnings. Moreover, I also have the same error if I run other R_scripts. 3- line 4: syntax error near unexpected token {' ./run_DESeq_stats.R: line 4: suppressPackageStartupMessages({'

pauvic commented 2 years ago

Sam, sorry for bothering you again .I have run line by line interactively in RStudio as you ask me first but I had this errors. I am sorry but I still can't make it works. Thanks for your time!

`# DESeq statistical calculations completeCondition <- data.frame(condition=factor(c(

dds <- DESeqDataSetFromMatrix(complete_table, completeCondition2, ~condition) Error in DESeqDataSetFromMatrix(complete_table, completeCondition2, ~condition) : ncol(countData) == nrow(colData) is not TRUE dds <- DESeq(dds) Error in is(object, "DESeqDataSet") : object 'dds' not found

This step creates the summary results output

res <- results(dds) Error in is(object, "DESeqDataSet") : object 'dds' not found org_results <- data.frame(res) Error in data.frame(res) : object 'res' not found

these next steps won't work if there's only 1 control sample (no replicates)

if (y > 1) {

sorted_org_results <- org_results[order(-org_results$baseMean),] Error: object 'org_results' not found colnames(sorted_org_results)[1] <- "Organism Name" Error in colnames(sorted_org_results)[1] <- "Organism Name" : object 'sorted_org_results' not found

saving and finishing up

cat ("\nSuccess!\nSaving results file as ", save_filename, "\n")

Success! Saving results file as DESeq_results.tab write.table(sorted_org_results, file = save_filename, append = FALSE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE) Error in is.data.frame(x) : object 'sorted_org_results' not found`

transcript commented 2 years ago

Hi Paula, okay, it's frustrating that you're on Windows (I have Mac and Linux systems), but I can still try and help troubleshoot.

First, for the issues with trying to run the R script from the command line: a bit of searching suggests that the issue is that your interpreter is trying to run this in Bash, and not switching to use R as the shell. You can see a bit more about this here: https://unix.stackexchange.com/questions/408355/running-r-script-via-shell-script-syntax-error-near-unexpected-token .

To test this, you could try running this script from the command line explicitly in the R shell by calling:

Rscript ./run_DESeq_stats.R -i <input_directory>

But now, when you get to running it in interactive mode, it looks like you're hitting a different problem.

It looks like the error first occurs when you're trying to assign the names to the columns of the complete_table; the error seems to be stating that the complete_table didn't get built properly from the merge of the experimental_table and the control_table, so there are more sample names than actual samples.

Could you give me some idea of what samples you're running this on? How many input files are you providing, and what are their names?

If that's not enough to help me troubleshoot, I might see if you could send the inputs (or truncated versions of them) to me by email so I could test.

pauvic commented 2 years ago

Sam, thanks again for your answer. Actually I use Linux but I use Windows for Rstudio. Sorry for the misunderstanding!

I try to run Rscript and after resolving some errors I got this: Warning message: package âoptparseâ was built under R version 3.6.3 Error in getopt_options(object, args) : Error in getopt(spec = spec, opt = args) : short flag "i" is invalid Calls: parse_args -> parse_options -> getopt_options Execution halted

I 've attached an example of my control and experimental files. I currently have 3 control files and 5 treatment files.

On the other hand, I try to run DIAMOND_specific_organism_retriever.py script. I am using conda environment for python two but when I want to run the script I have this error

(py2) @.***:~/paula_Proyectos2/samsa2/python_scripts$ python --version Python 2.7.15

(py2) @.***:~/paula_Proyectos2/samsa2/python_scripts$ ./DIAMOND_specific_organism_retriever.py -bash: ./DIAMOND_specific_organism_retriever.py: /usr/lib/python2.7: bad interpreter: Permission denied (py2)

Once again, thanks for your time.

best

paula

control_T2_5.merged.subsys_annotated.receipt https://drive.google.com/file/d/1gaisOwMy4Vct6RYHDGESmwCT08y65eB6/view?usp=drive_web experimental_T2_14.merged.subsys_annotated.receipt https://drive.google.com/file/d/1ScPHM_7Zf1R-ic9pk3udR6q4qZkN93MK/view?usp=drive_web

El vie, 4 mar 2022 a las 15:23, Sam Westreich @.***>) escribió:

Hi Paula, okay, it's frustrating that you're on Windows (I have Mac and Linux systems), but I can still try and help troubleshoot.

First, for the issues with trying to run the R script from the command line: a bit of searching suggests that the issue is that your interpreter is trying to run this in Bash, and not switching to use R as the shell. You can see a bit more about this here: https://unix.stackexchange.com/questions/408355/running-r-script-via-shell-script-syntax-error-near-unexpected-token .

To test this, you could try running this script from the command line explicitly in the R shell by calling:

Rscript ./run_DESeq_stats.R -i

But now, when you get to running it in interactive mode, it looks like you're hitting a different problem.

It looks like the error first occurs when you're trying to assign the names to the columns of the complete_table; the error seems to be stating that the complete_table didn't get built properly from the merge of the experimental_table and the control_table, so there are more sample names than actual samples.

Could you give me some idea of what samples you're running this on? How many input files are you providing, and what are their names?

If that's not enough to help me troubleshoot, I might see if you could send the inputs (or truncated versions of them) to me by email so I could test.

— Reply to this email directly, view it on GitHub https://github.com/transcript/samsa2/issues/70#issuecomment-1059409788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AX6JJHBF32GOGFVQHKPSETDU6JIIZANCNFSM5PK6JAMA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

transcript commented 2 years ago

Hi Paula,

Okay, progress! First off, I made an error in my previous comment; it needs to be -I flag (uppercase I, not lowercase) to work. So I'd try:

Rscript ./run_DESeq_stats.R -I <input_directory> -O output_file.tsv

Capital I, capital O.

That should allow optparse to find the input files directory.

I grabbed the control_T2_5.merged.RefSeq_annot_organism.tsv and experimental_T2_14.merged.RefSeq_annotorganism.tsv files, and it worked with them, after ONE change that you should be aware: the script tries to parse on underscores (""), and so when you had these named as "T2_5" and "T2_14", the script read them both in as T2 and complained that they were duplicates.

I will note this as an item for me to fix, but the easy solution for right now is just to replace those second underscores with dashes:

control_T2-5.merged.RefSeq_annot_organism.tsv
experimental_T2-14.merged.RefSeq_annot_organism.tsv

The *.receipt files, by the way, are not used by R for any of the analysis, so you don't need to share those. I know they're big, but they're just reporting on line-by-line progress and can be ignored for downstream analysis. They're mainly if you want to go back and check on a specific read.

Can you try running the Rscript command again, this time with the uppercase letter flags?


Second, regarding the python "bad interpreter" error, my suspicion is that, with Conda, Python is in a different location. You could probably just delete the shebang from line 1 of the script (remove #!/usr/lib/python2.7). I've also updated these to be callable with Python3 as well, so that could be an easier approach.

Sam