pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
300 stars 95 forks source link

Pathway analysis using Sleuth results #91

Open gk-bioin4m8x opened 7 years ago

gk-bioin4m8x commented 7 years ago

Hello,

Did anyone try pathway analysis with GAGE and Pathview using Sleuth results (results_table.csv where column names are: target_id, test_stat, pval, qval, rss, sigma_sq, tech_var, mean_obs, var_obs, sigma_sq_max, smooth_sigma_sq, final_sigma_sq, degree_free, ens_gene, ext_gene)? I am following this tutorial https://www.r-bloggers.com/tutorial-rna-seq-differential-expression-pathway-analysis-with-sailfish-deseq2-gage-and-pathview/ but I have three samples with replicates (and more in future). I am able to get the pathways but not clear to which sample they belong to and also unable to get the figures showing up and down-regulated genes in the pathway. I think I need to troubleshoot the input for many samples with replicates. My code:

# Retrieving rows with qval < 0.05 from results_table. >lrt_results.e <- dplyr::filter(results_table, qval < 0.05)

# Adding entrez column to Sleuth results for pathway analysis (where Rrr= species (using hypothetical name here)). >lrt_results.e$entrez <- mapIds(org.Rrr.eg.db, keys=lrt_results.e$target_id, column="ENTREZID", keytype="ENSEMBLTRANS", multiVals="first")

# Retrieving rows where value of entrez is not NA
>lrt_results.e.noNA <- subset(lrt_results.e, !is.na(entrez)) >qval.e <- lrt_results.e.noNA$qval >names(qval.e) <- lrt_results.e.noNA$entrez

# KEGG pathway gene sets for Rrr >kegg_Rrr <- kegg.gsets("Rrr")

# sigmet.idx is an index of numbers of signaling and metabolic pathways in kegg.gsets.
>kegg_Rrr_sigmet <- kegg_Rrr$kg.sets[kegg_Rrr$sigmet.idx]

# Using gage package' >keggres.e <- gage(qval.e, gsets=kegg_Rrr_sigmet, same.dir=TRUE) >lapply(keggres.e, head)`

Output: Output_github.txt

>keggrespathways <- data.frame(id=rownames(keggres.e$greater), keggres.e$greater) %>% tbl_df() %>% filter(row_number()<=3) %>% .$id %>% as.character() >keggrespathways

Output: [1] "Rrr01100 Metabolic pathways" "Rrr04020 Calcium signaling pathway"
[3] "Rrr04810 Regulation of actin cytoskeleton"

>keggresids <- substr(keggrespathways, start=1, stop=8)

>keggresids

Output: [1] "Rrr01100" "Rrr04020" "Rrr04810"

>plot_pathway <- function(pid) pathview(gene.data=qval.e, pathway.id=pid, species="Rrr", new.signature=FALSE) >pmp <- sapply(keggresids, function(pid) pathview(gene.data=qval.e, pathway.id=pid, species="Rrr"))

Info: Downloading xml files for Rrr01100, 1/1 pathways..
Info: Downloading png files for Rrr01100, 1/1 pathways..
Info: Working in directory D:/Programs/kallisto/Rrr/quant_Sleuth
Info: Writing image file Rrr01100.pathview.png
Info: Downloading xml files for Rrr04020, 1/1 pathways..
Info: Downloading png files for Rrr04020, 1/1 pathways..
Hide Traceback              
Rerun with Debug
Error in UseMethod("select_") : 
no applicable method for 'select_' applied to an object of class "c('OrgDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')" 
11.
select_(.data, .dots = lazyeval::lazy_dots(...)) 
10.
select(db.obj, keys = in.ids, keytype = in.type, columns = c(in.type, 
    out.type)) 
9.
withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 
8.
suppressWarnings(select(db.obj, keys = in.ids, keytype = in.type, 
    columns = c(in.type, out.type))) 
7.
geneannot.map(in.ids = eg, in.type = "entrez", out.type = category, 
    org = org, pkg.name = pkg.name, ...) 
6.
eg2id(as.character(plot.data.gene$kegg.names), category = "SYMBOL", 
    pkg.name = gene.annotpkg) 
5.
pathview(gene.data = qval.e, pathway.id = pid, species = "Rrr") 
4.
FUN(X[[i]], ...) 
3.
lapply(X = X, FUN = FUN, ...) 
2.
sapply(keggresids, function(pid) pathview(gene.data = qval.e, 
    pathway.id = pid, species = "Rrr")) 
1.
sapply(keggresids, function(pid) pathview(gene.data = qval.e, 
    pathway.id = pid, species = "Rrr")) 

Changed gene.data=qval.e to gene.data=lrt_results.e.noNA

>pmp. <- sapply(keggresids, function(pid) pathview(gene.data=lrt_results.e.noNA, pathway.id=pid, species="Rrr"))

Info: Working in directory D:/Programs/kallisto/Dr/quant_Sleuth
Info: Writing image file Rrr01100.pathview.png
Note: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
'select()' returned 1:1 mapping between keys and columns
Info: Working in directory D:/Programs/kallisto/Dr/quant_Sleuth
Info: Writing image file Rrr04020.pathview.png
Note: None of the genes or compounds mapped to the pathway!
Argument gene.idtype or cpd.idtype may be wrong.
'select()' returned 1:1 mapping between keys and columns
Info: Working in directory D:/Programs/kallisto/Dr/quant_Sleuth
Info: Writing image file Rrr04810.pathview.png
Warning messages:
1: In colnames(plot.data)[c(1, 3, 9:ncs)] = c("kegg.names", "all.mapped",  :
number of items to replace is not a multiple of replacement length
2: In colnames(plot.data)[c(1, 3, 9:ncs)] = c("kegg.names", "all.mapped",  :
number of items to replace is not a multiple of replacement length

Figures: rrr01100 pathview rrr01100 rrr04020 pathview rrr04020 rrr04810 pathview rrr04810

Kindly guide regarding the input.

Thanks!

gk-bioin4m8x commented 7 years ago

Hello, It would be very helpful if anyone has done pathway analysis with GAGE and Pathview using Sleuth results.

Thanks!

ttriche commented 7 years ago

ENST-to-Reactome mappings (transcript level) are rather sparse (usually only the canonical isoform is annotated) so it may actually be hurting you to operate at the tx level. We found this out the direct way by running both. If your gene sets are very sparse, most all methods are going to suffer due to lack of structure.

--t

On Mon, Jan 23, 2017 at 3:44 AM, GK notifications@github.com wrote:

Hello, It would be very helpful if anyone has done pathway analysis with GAGE and Pathview using Sleuth results.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pachterlab/sleuth/issues/91#issuecomment-274467601, or mute the thread https://github.com/notifications/unsubscribe-auth/AAARIp5MsRVYfDYrV5KJFBO2GOdVgSNfks5rVJKQgaJpZM4LYI1s .

mictadlo commented 6 years ago

Hi, Any updates on it or do anyone have a workflow how to do Pathway analysis after sleuth or sleuth_live?

I did the below steps to get transcript mapping to gene ids which were based on this tutorial

diamond makedb --in nr.faa -d nr -p 8
diamond blastx -d nr -q Trinity.fasta  -o nr.blastx.out  -p 8 -k 1 --outfmt 6

echo -e "target_id\tens_gene" > t2g.txt 
cut -f 1-2 nr.blastx.out  >> t2g.txt

The below R code was based on:

#!/usr/bin/env Rscript

    library("sleuth")

    sample_id <- dir("kallisto")
    sample_id

    kal_dirs <- sapply(sample_id, function(id) file.path("kallisto", id))
    kal_dirs

    s2c <- read.table("!{exp_file}", header = TRUE, stringsAsFactors=FALSE)
    s2c <- dplyr::select(s2c, sample = ids, Tissue_type, condition)
    s2c <- s2c[order(s2c$sample), ]
    s2c <- dplyr::mutate(s2c, path = kal_dirs)

    print(s2c)

    t2g <- read.table("!{t2g_file}", header = TRUE, stringsAsFactors=FALSE)
    t2g

    # Load the kallisto processed data into the object
    so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g, extra_bootstrap_summary = TRUE)

    # Estimate parameters for the sleuth response error measurement (full) model 
    so <- sleuth_fit(so)

    #Next, we fit the reduced model. In this case, the reduced model is the intercept-only model
    so <- sleuth_fit(so, ~1, 'reduced')
    print("Done sletu_fit")

    #and finally perform the test:
    so <- sleuth_lrt(so, 'reduced', 'full')
    print("DONE sleut_lrt")
    models(so)

    save(so, file=paste("sleuth_object.so"))

Can I use sleuth_live or do I have to add in the above code the following line?

so <- sleuth_wt(so,  'conditionscramble')
gene_table <- sleuth_gene_table(so, test = "conditionscramble", test_type = "wt")
write.table(gene_table, paste("gene_table_results.txt"), sep="\t")

How do I know which test should I use?

Thank you in advance,

Michal