reactome / ReactomeGSA

R client for the REACTOME Analysis Service for comparative multi-omics gene set analysis
Other
21 stars 4 forks source link

How to automatically download table displayed in reactome link #22

Closed lireo closed 3 years ago

lireo commented 3 years ago

Hello

I recently discoverd reactomeGSA and I think it is very useful in my analysis but I have a question.

When I run perform_reactome_analysis in R, I don't find the table corresponding to the reactome link interactively displayed in my browser. I follow the link provided and manually go to download section to download the csv table but this is not reproducible. I am particularly interested in the "submitted entities found" column. There is a way to do this directly in my R script ?

Thank you Aurelie

lireo commented 3 years ago

Hello again

I explore informations provided by the link and I found discrepencies. For the same analysis with perform_reactome_analysis, FDR is different between R get_result(type = "pathway") and FDR display in reactome website provided by reactome_links. I found this only for PADOG method, not Camera. I try with my RNA-seq data and your training data set griss_melanoma_rnaseq. FDR are so different, that in one case (get_results) some pathways are significant and in the other case (reactome website) no pathway significant (FDR > 0.1) but arrows in sample column indicates pathways are significant.

I'm lost and I don't understand why. Wich one believe ?

And by the way, I don't understand how this FDR is calculated, it's PADOG or ReactomeGSA ? In my data I have the same FDR for all significant pathways (more than 50).

To reproduce, this is my little R script with your training data:

` library(ReactomeGSA) library(ReactomeContentService4R) library(ReactomeGSA.data)

available_methods <- get_reactome_methods(print_methods = FALSE, return_result = TRUE)

only show the names of the available methods

available_methods$name

example https://bioconductor.org/packages/release/bioc/vignettes/ReactomeGSA/inst/doc/using-reactomegsa.html

PADOG request

my_request <-ReactomeAnalysisRequest(method = "PADOG") my_request <- set_parameters(request = my_request, max_missing_values = 0.5)

dataset

data("griss_melanoma_rnaseq") total_reads <- rowSums(griss_melanoma_rnaseq$counts) griss_melanoma_rnaseq <- griss_melanoma_rnaseq[total_reads >= 100, ]

my_request <- add_dataset(request = my_request, expression_values = griss_melanoma_rnaseq, name = "RNA-seq", type = "rnaseq_counts", comparison_factor = "treatment", comparison_group_1 = "MOCK", comparison_group_2 = "MCM", additional_factors = c("cell_type", "patient"),

This adds the dataset-level parameter 'discrete_norm_function' to the request

                      discrete_norm_function = "TMM")

my_request

analysis

result <- perform_reactome_analysis(request = my_request, compress = F)

pathways

get_result(result, type = "pathways", name = "RNA-seq") head(pathways(result), n=20L) current_pathways <- pathways(result)

link

link=reactome_links(result, return_result = TRUE)`

and the link of my run : https://reactome.org/PathwayBrowser/#/DTAB=AN&ANALYSIS=MjAyMTA2MDkwODA3NTlfNzM4OTI%253D

Thanks for your help

jgriss commented 3 years ago

Hi @lireo

Thanks a lot for your interest in ReactomeGSA!

This is a bug on our side and will be fixed in our next release. Thanks a lot for making us aware of it!

The reason: The PathwayBrowser (the tool behind the web interface) also displays the results of the classical over-representation analysis. The CSV file you see is this result file - which of course doesn't make since for ReactomeGSA based analyses.

When you click "Analysis" > "Download" (see screenshot below) you get multiple files:

image

The correct ones are the R script, the PDF report and the MS Excel file.

The R script is the recommended way to directly load this data into your R session.

Btw. while the FDR and p-values (and all stats related columns) are not correct, the descriptive columns like number of entities found are correct and do match your input.

I'll additionally add a feature request to make the mapping results (ie. features found and features per pathway) available in the R report (#23).

Kind regards, Johannes

lireo commented 3 years ago

Thanks for the quick reply.

I found another strange behavior with my PADOG analysis. It only do 10/19 permutations with my data, it's few. Maybe my identical FDR are linked to this limitation. I noticed that, when I run the script with training data set, there were 1000 permuations like in PADOG default value (NI) in the documentation.

Is it a bug with my data ? I provide it with a raw count dataframe and compare only 3 samples against 3 samples at a time. It would be nice if we can force number of iteration of PADOG to test if it is better.

Best Aurelie

jgriss commented 3 years ago

Hi @lireo

Even if the status message shows 10/19 permutations it does perform all 19. That's just the case if the next status message comes before the update.

PADOG cannot perform the default 1000 permutations for every dataset. Then, it automatically selects a smaller number. That's implemented directly in PADOG.

Kind regards, Johannes

lireo commented 3 years ago

Thank you for clarification