Open daniellembecker opened 1 week ago
@lmika
Hi @daniellembecker , I'm not sure what this bug refers to. This doesn't look like something that goseq generally supports (it's a tool for generating sequence diagrams).
Hi @lmika thank you for the response. The issue is inconsistency with the nullp and goseq functions in the goseq package. When used on different computers the package is not computing the same values using the same data, code, and functions from the goseq package
Hello,
We are trying to find the root issue of why we are having differing final output results from using the same datasets, parameters, functions, R (4.4.1), RStudio ("2024.04.2+764"), and goseq (1.56.0) package versions. Using the code below, I have tried to replicate the results with the exact same cloned repository on multiple computers and operating systems while checking that all steps are identical when running. However, with this code I am receiving 31 significantly enriched terms, while when someone else clones the repository or the code is used on different operating systems we see there are no significantly enriched terms. This is clearly a major issue and we have tried multiple steps to solve the issue as updating all versions to be the same, trying on multiple systems, cloning and recloning the repo, updating the repo and confirming it has been pushed to Github correctly but we have run out of troubleshooting options. Any ideas or help would be much appreciated!
Generate vector with names of all genes detected in our dataset
ALL.vector <- c(filtered_Pverr.annot$gene_id)
Generate length vector for all genes
LENGTH.vector <- as.integer(filtered_Pverr.annot$length)
Generate vector with names in just the contrast we are analyzing
ID.vector.down <- DEG.res %>% filter(direction=="Downregulated") %>% pull(gene_id)
Get a list of GO Terms for all genes detected in our dataset
Format to have one goterm per row with gene ID repeated
Construct list of genes with 1 for genes in contrast and 0 for genes not in the contrast
weight gene vector by bias for length of gene: load in the gene.vector(a list of all genes with 1 indicating it is a DEG in the group of interest and 0 meaning it is not a DEG) and bias.data (list of lengths for all genes)
pwf <- nullp(genes, 'hg19', 'ensGene')
run goseq using Wallenius method for all categories of GO terms: load in the pwf object (gene name, a 1 or 0 to indicate DEG, the length, and the pwf function; generated from pwf above), and the list of GO terms for all genes
Output of data from MacOS Monterey version 12.6.6:
Output of data from macOS Sonoma 14.5:
We also both ran a test dataset:
data(genes) pwf <- nullp(genes, ‘hg19’, ‘ensGene’)
Which gave us both this same output:![Screenshot 2024-06-21 at 11 58 56](https://github.com/lmika/goseq/assets/40968353/4907e23f-a79c-408c-b1a4-3b03e4da29e2)