sellerslab / gemini

GEMINI: A variational Bayesian approach to identify genetic interactions from combinatorial CRISPR screens
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

gemini_create_input Error in dplyr::filter(., colSums(!is.na(Input$counts)) != 0) #3

Closed neal-walton-suh closed 1 year ago

neal-walton-suh commented 1 year ago

Hello,

I am trying to use GEMINI's gemini_create_input() function with my own counts matrix, sample.replicate.annotation data frame, and guide.annotation data frame. My counts matrix is 9696 by 13, sample.replicate.annotation is 13 by 3, and guide.annotation 9696 by 7. The function goes through Merging sample annotations with colnames of counts.matrix... and Merging guide annotations with rownames()..., then it outputs this error with traceback:

Input <- gemini_create_input(counts.matrix = counts, 
                             sample.replicate.annotation = sample.replicate.annotation, 
                             guide.annotation = guide.annotation, 
                             samplesAreColumns = T, 
                             sample.column.name = "samplename", 
                             gene.column.names = c("L.gene", "R.gene"), 
                             ETP.column = c(1, 2, 3), 
                             LTP.column = (1:ncol(counts))[-c(1, 2, 3)], 
                             verbose = T)

Error in dplyr::filter(., colSums(!is.na(Input$counts)) != 0) :
Caused by error:
! `..1` must be of size 23 or 1, not size 13.

19. stop(fallback)
18. signal_abort(cnd, .file)
17. abort(message, class = error_class, parent = parent, call = error_call)
16. (function (cnd)
{
local_error_context(dots, i = frame[[i_sym]], mask = mask)
if (inherits(cnd, "dplyr:::internal_error")) { ...
15. signalCondition(cnd)
14. signal_abort(cnd, .file)
13. abort(class = c(class, "dplyr:::internal_error"), dplyr_error_data = data)
12. dplyr_internal_error("dplyr:::filter_incompatible_size", list(
index = 1L, size = 13L, expected_size = 23L))
11. eval()
10. mask$eval_all_filter(dots, env_filter)
9. withCallingHandlers(mask$eval_all_filter(dots, env_filter), error = dplyr_error_handler(dots = dots,
mask = mask, bullets = filter_bullets, error_call = error_call),
warning = function(cnd) {
local_error_context(dots, i, mask) ...
8. filter_eval(dots, mask = mask, error_call = error_call, user_env = user_env)
7. filter_rows(.data, dots, by)
6. filter.data.frame(., colSums(!is.na(Input$counts)) != 0)
5. dplyr::filter(., colSums(!is.na(Input$counts)) != 0)
4. dplyr::select(., c("colname", sample.col.name, "TP"))
3. Input$replicate.map %>% dplyr::filter(colSums(!is.na(Input$counts)) !=
0) %>% dplyr::select(c("colname", sample.col.name, "TP"))
2. gemini_prepare_input(Output, gene.columns = gene.column.names)
1. gemini_create_input(counts.matrix = counts, sample.replicate.annotation = sample.replicate.annotation,
guide.annotation = guide.annotation, samplesAreColumns = T,
sample.column.name = "samplename", gene.column.names = c("L.gene",
"R.gene"), ETP.column = c(1, 2, 3), LTP.column = (1:ncol(counts))[-c(1, ...

So far, I have only figured out that this error is occurring at the final step of gemini_create_input() where it uses the internal function gemini_prepare_input(). Could you help me with this error?

Current using R version 4.2.0 (2022-04-22) and gemini_1.12.0.

Thank you, Neal

ssj1739 commented 1 year ago

Hi Neal,

Thanks for your message. If you'd be willing to send snippets of your counts, sample.replicate.annotation and guide.annotation data, or a subset that replicates the error you see above, I'd definitely be able to help more comprehensively.

But aside from that, could I ask if your data has any missing values? gemini_prepare_input filters out replicates with all missing values. It looks like there's a discrepancy between the number of samples/replicates in your sample.replicate.annotation and your counts matrix resulting from filtering out NA values, but it's hard to tell without seeing the actual data.

Thanks, Sidharth

neal-walton-suh commented 1 year ago

Hi Sidharth,

Of course. Here are snippets of the said data, respectively. None of the datasets have any NA values but there are some duplicate replicates as seen in the sample.replicate.annotation data frame; this was my other guess as to why the function may be outputting the error. The duplicate replicates columns in the counts matrix have different counts as you can see below for B16F10.RepA.

image image image

The only things that I do not have in the guide.annotation data frame are the wells columns. Are those required?

Thanks, Neal

ssj1739 commented 1 year ago

Hi Neal,

Aha! Yes, having duplicated column names isn't compatible with most "tidy" R programming practices. I would recommend changing the duplicated column names in the counts matrix (and the associated colname column in the sample.replicate.annotation) to something like B16F10.RepA.1

To your last point, no, the wells columns aren't required. That data was included in the Najm et al. 2018 supplemental data, so we also included it in our example dataset, but those columns aren't used by gemini.

Please let me know if this resolved your issue!

Best, Sidharth

neal-walton-suh commented 1 year ago

Hi Sidharth,

Awesome, it worked! Thank you so much for your help, I appreciate it!

Best, Neal