markrobinsonuzh / cytofWorkflow

MIT License
14 stars 3 forks source link

differential_abundance_wrapper #20

Closed antoine4ucsd closed 3 years ago

antoine4ucsd commented 3 years ago

Hello First I wanted to thank you for this great workflow! I am applying it to a dataset of 43 samples and 54 clusters identified - everything works fine but the differential_abundance_wrapper. I do have a data frame input with cell counts (each row is a population, each column is a sample):

head(counts) Sample10 Sample11 Sample12 Sample13 Sample15 Sample16 Sample17 Sample18 Sample19 Sample20 Sample21 Sample22 Sample23 Sample24 Sample25 Cluster_1 3258 2254 3110 3457 4688 3269 4109 2784 4909 3451 4518 4998 4589 2121 4431 Cluster_10 830 1123 1313 1081 605 551 437 560 375 399 345 302 1097 1316 1767 Cluster_11 42 32 35 86 15 42 147 120 338 258 94 36 204 179 27 Cluster_12 1783 509 602 412 2542 299 615 691 579 603 579 653 860 1407 2117 Cluster_13 442 1032 947 1345 1689 2640 1392 2739 670 1711 565 1083 277 966 318 Cluster_14 1380 2398 2674 2172 4314 2935 4818 2066 3732 2046 2946 3756 4422 634 1392 Sample27 Sample28 Sample29 Sample30 Sample31 Sample32 Sample33 Sample34 Sample35 Sample36 Sample37 Sample38 Sample39 Sample40 Sample41 Cluster_1 3740 3728 2335 4417 4658 3619 3725 4209 2235 6088 3172 2959 6970 4172 4744 Cluster_10 387 793 1042 374 157 516 1058 629 595 452 790 1016 414 395 212 Cluster_11 97 168 84 40 164 187 31 180 8 58 103 216 90 27 37 Cluster_12 899 925 874 366 809 338 902 2449 920 653 662 703 316 479 256 Cluster_13 1081 1613 500 490 2073 2650 2438 1993 2222 804 1399 1728 594 3227 3688 Cluster_14 1618 1556 1516 4257 947 2038 2074 1131 967 3155 2081 1375 4778 1633 2053 Sample42 Sample43 Cluster_1 3975 2625 Cluster_10 523 238 Cluster_11 28 119 Cluster_12 910 652 Cluster_13 1028 2236 Cluster_14 1860 1517

I also have a metadata md with sample ID

md$sample_id [1] "Sample37" "Sample26" "Sample41" "Sample31" "Sample39" "Sample40" "Sample15" "Sample16" "Sample29" "Sample30" "Sample11" "Sample10" "Sample25" [14] "Sample24" "Sample23" "Sample38" "Sample33" "Sample27" "Sample43" "Sample21" "Sample36" "Sample17" "Sample19" "Sample20" "Sample22" "Sample28" [27] "Sample18" "Sample35" "Sample13" "Sample32" "Sample34" "Sample12" "Sample42"

K=54 =number of clusters but I still get this error

da_out1 <- differential_abundance_wrapper(counts, md = md, formula = formula_glmer_binomial1, K = K) Error in [.data.frame(counts, i, md$sample_id) : undefined columns selected Called from: [.data.frame(counts, i, md$sample_id)

is there a problem with the indexing the md$sample_id in the formula

data_tmp <- data.frame(y = as.numeric(counts[i, md$sample_id]), total = ntot[md$sample_id], md)

for example if i=1 data_tmp <- data.frame(y = as.numeric(counts[1, md$sample_id]), total = ntot[md$sample_id], md)

Error in [.data.frame(counts, 1, md$sample_id) : undefined columns selected

Hopefully I am missing something obvious let me know if you need anything else to debug thank you in advance for your help!

markrobinsonuzh commented 3 years ago

Dear @antoine4ucsd,

The differential_abundance_wrapper() function is from quite an old version of the workflow .. from about November 2017 or so.

It also depends what version of R and Bioconductor you have, but I would highly suggest using BioC 3.11 or BioC 3.12 (just released last week), because a LOT has changed:

https://www.bioconductor.org/packages/3.11/workflows/vignettes/cytofWorkflow/inst/doc/cytofWorkflow.html https://www.bioconductor.org/packages/3.12/workflows/vignettes/cytofWorkflow/inst/doc/cytofWorkflow.html

I will close this issue and you can open another one if you have problems.

Best, Mark

antoine4ucsd commented 3 years ago

Thank you Mark This new release looks really promising.

I would like to start from phonograph output though… I was able to create and exp object from the previous release and create the plot below using the following p1<-ggplot(ggdf, aes(x = expression, color = condition, group = sample_id)) + geom_density() + facet_wrap(~ antigen, nrow = 4, scales = "free") + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1), strip.text = element_text(size = 7), axis.text = element_text(size = 5)) + guides(color = guide_legend(ncol = 1)) + scale_color_manual(values = color_conditions)

briefly, ggdf was obtained from a large csv compiling all the phenograph output . I have this large csv (matrix) with all merged results (csv exported from phonograph) CD25 CD3 CD27 Ki67 FoxP3 CD45RA CXCR5 CD8 CD4 output_10_fct_006_Live_134058 3.461247 6.945573 6.657199 6.622825 2.678033 5.244894 2.0681636 4.137802 7.937762 output_10_fct_006_Live_124022 4.156580 5.656681 3.639100 6.697673 2.753143 3.434491 1.3978777 4.561760 7.840607 output_10_fct_006_Live_53241 3.293127 6.475993 5.492834 6.830183 3.542321 1.324333 2.4657699 3.292760 7.921481 output_10_fct_006_Live_707016 4.079490 6.115073 4.446999 6.561850 2.651721 3.034348 0.7767398 3.758221 7.725501 output_10_fct_006_Live_370482 4.047352 6.645523 5.190899 6.245829 3.033671 3.750919 2.2852625 4.081980 7.988378 output_10_fct_006_Live_533614 3.884992 6.127685 5.911404 6.767333 2.665573 2.917647 2.6262258 3.998380 7.836012

and I attach the sample ids to each row ggdf <- data.frame(sample_id = sample_ids, expr)

head(ggdf) sample_id CD25 CD3 CD27 Ki67 FoxP3 CD45RA CXCR5 CD8 CD4 1 Sample10 3.461247 6.945573 6.657199 6.622825 2.678033 5.244894 2.0681636 4.137802 7.937762 2 Sample10 4.156580 5.656681 3.639100 6.697673 2.753143 3.434491 1.3978777 4.561760 7.840607 3 Sample10 3.293127 6.475993 5.492834 6.830183 3.542321 1.324333 2.4657699 3.292760 7.921481 4 Sample10 4.079490 6.115073 4.446999 6.561850 2.651721 3.034348 0.7767398 3.758221 7.725501 5 Sample10 4.047352 6.645523 5.190899 6.245829 3.033671 3.750919 2.2852625 4.081980 7.988378 6 Sample10 3.884992 6.127685 5.911404 6.767333 2.665573 2.917647 2.6262258 3.998380 7.836012

After melting it gives me this

ggdf <- melt(ggdf, id.var = "sample_id", value.name = "expression", variable.name = "antigen") head(ggdf)

head(ggdf) sample_id antigen expression 1 Sample10 CD25 3.461247 2 Sample10 CD25 4.156580 3 Sample10 CD25 3.293127 4 Sample10 CD25 4.079490 5 Sample10 CD25 4.047352 6 Sample10 CD25 3.884992

I also have a md file (in which I include the fcs filename but not using it) head(md) file_name sample_id condition patient_id 1 output_10_fct_006_Live.fcs Sample10 Men Patient18931 2 output_11_fct_010_Live.fcs Sample11 Men Patient18921 3 output_12_fct_014_Live.fcs Sample12 Men Patient19461 4 output_13_fct_018_Live.fcs Sample13 Men Patient19213 5 output_15_fct_005_Live.fcs Sample15 Men Patient18600 6 output_16_fct_009_Live.fcs Sample16 Women Patient18763

And I created an ‘artificial’ panel file since I already relabeled the markers in the ggdf file head(panel) fcs_colname antigen 1 CD25 CD25 2 CD3 CD3 3 CD27 CD27 4 Ki67 Ki67 5 FoxP3 FoxP3 6 CD45RA CD45RA

Is it possible to create a sce object from these data ???

I’d rather keep the phenograph output. I had the feeling that is was possible with the former version? I was also able to create heatmaps with pheatmap rng <- colQuantiles(expr, probs = c(0.01, 0.99)) expr01 <- t((t(expr) - rng[, 1]) / (rng[, 2] - rng[, 1])) expr01[expr01 < 0] <- 0 expr01[expr01 > 1] <- 1

expr_median_sample_tbl <- data.frame(sample_id = sample_ids, expr) %>% group_by(sample_id) %>% summarize_all(funs(median))

expr_median_sample <- t(expr_median_sample_tbl[, -1]) colnames(expr_median_sample) <- expr_median_sample_tbl$sample_id

expr_median <- data.frame(expr, cell_clustering = m2$clusterID) %>% group_by(cell_clustering) %>% summarize_all(funs(median)) expr01_median <- data.frame(expr01, cell_clustering = m2$clusterID) %>% group_by(cell_clustering) %>% summarize_all(funs(median)) expr_heat <- as.matrix(expr01_median[, colnames(expr01)]) rownames(expr_heat) <- expr01_median$cell_clustering

I hope I am clear enough…do not hesitate if you need some clarification. I can also share files if it is easier

Thank you so much!

-- a [Chart Description automatically generated]

[Chart, treemap chart Description automatically generated]

From: markrobinsonuzh notifications@github.com Date: Tuesday, November 3, 2020 at 10:47 PM To: markrobinsonuzh/cytofWorkflow cytofWorkflow@noreply.github.com Cc: antoine antoine.chaillon@gmail.com, Mention mention@noreply.github.com Subject: Re: [markrobinsonuzh/cytofWorkflow] differential_abundance_wrapper (#20)

Dear @antoine4ucsdhttps://github.com/antoine4ucsd,

The differential_abundance_wrapper() function is from quite an old version of the workflow .. from about November 2017 or so.

It also depends what version of R and Bioconductor you have, but I would highly suggest using BioC 3.11 or BioC 3.12 (just released last week), because a LOT has changed:

https://www.bioconductor.org/packages/3.11/workflows/vignettes/cytofWorkflow/inst/doc/cytofWorkflow.html https://www.bioconductor.org/packages/3.12/workflows/vignettes/cytofWorkflow/inst/doc/cytofWorkflow.html

I will close this issue and you can open another one if you have problems.

Best, Mark

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/markrobinsonuzh/cytofWorkflow/issues/20#issuecomment-721545103, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENFHZ4NSQXZGGCOSKIHWX3SOD2IFANCNFSM4TJLRXDQ.