ocbe-uio / DIscBIO

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
Other
12 stars 5 forks source link

Adapt package to different types of organisms #28

Closed wleoncio closed 3 years ago

wleoncio commented 3 years ago

DIscBIO was developed based on two datasets using human and mouse genes. It would be great if it could be adapted to work on other organisms.

Adapted details from @SystemsBiologist

What to change

Conquer is a collection of analysis-ready public scRNA-seq data sets. We would like to add it to our manuscript. It has about 40 datasets from three organisms: human, Zebrafish and mouse. When I wrote DIscBIO I was focusing on humans but now we want to make it applicable for any organism with a taxonomy ID. To do so we need to change in the DIscBIO-classes.R lines 157-159 from

https://github.com/ocbe-uio/DIscBIO/blob/0c908992a28bf91efaf1079bb910cb877a00ff99/R/DIscBIO-classes.R#L157-L159

to

shortNames <- substr(rownames(tmpExpdataAll), 1, 3)
        geneTypes <- factor(
            c(ENS = "ENS", ERC = "ERC")[shortNames]

I did not change the code because the dev is not working, I was worried to make the situation worst. Could you change the code after you bring back dev to work?

Expected behavior

Testing code

library(MultiAssayExperiment)
GSE41265 <- readRDS("~/GSE41265.rds")
Dataset=assays(experiments(GSE41265)[["gene"]])[["count"]]
rownames(Dataset) <- as.list(sub("*\\..*", "", unlist(rownames(Dataset))))
sc<- DISCBIO(Dataset)
sc<- Clustexp(sc,cln=2,quiet=F,clustnr=6,rseed=17000)    
Cdiff<-DEGanalysis2clust(sc,Clustering="K-means",K=2,fdr=0.05,name="M",export = TRUE,quiet=F)  
Cdiff<-DEGanalysis(sc,Clustering="K-means",K=2,fdr=0.05,name="All",export = TRUE,quiet=F)   ####### differential expression analysis between all clusters
CdiffBinomial<-ClustDiffGenes(sc,K=2,export = T,fdr=.01,quiet=F)

At the moment if DEGanalysis and DEGanalysis2clust can work even without having the gene names as ClustDiffGenes that will be great.

wleoncio commented 3 years ago

@SystemsBiologist

Could you change the code after you bring back dev to work?

Sure thing, but what do you mean about dev not working?

Edit: pasting e-mail reply:

The dev is working fine the problem was from binder. Now everything is working for all organisms except for two functions: DEGanalysis2clust() DEGanalysis() You can see that in the "DIscBIO-CONQUER Notebook": https://github.com/ocbe-uio/DIscBIO/blob/dev/notebook/DIscBIO-CONQUER%20Notebook.ipynb The ClustDiffGenes() is working although the output does not show the gene symbol name but that is fine. It would be great if we can do in the future the same for DEGanalysis2clust() and DEGanalysis().

wleoncio commented 3 years ago

@SystemsBiologist, it looks like commit c9313b5c6763ce37942eb99665205f797970bd7c has already implemented the changes in the OP, should this issue be closed then? What about the problems posted, i.e.:

The problem will be in 3 functions (DEGanalysis2clust, DEGanalysis and ClustDiffGenes) The outcome of ClustDiffGenes() is not perfect but it is OK The main problem is in DEGanalysis2clust and DEGanalysis. They are not working at all.

SystemsBiologist commented 3 years ago

You can close this one since you have created a "To do list"