Closed bschilder closed 1 week ago
Originally reported by @Al-Murphy. Potentially related to:
Just as an update, I also tried the different versions of the human cell landscape CTD using github tags 'v0.1.10' and 'v0.0.1' but this didn't help either!
One thing I'm noticing is that the error only occurs with specific combinations of CTD level and test type.
Specifically, CTD level 2 with the linear tests is the only one that's failing.
We can see the celltype names aren't duplicated in the original CTD:
colnames(HCL$level_2$specificity_quantiles)[duplicated(colnames(HCL$level_2$specificity_quantiles))]
> character(0)
This remains true even after restandardising the CTD:
HCL2=EWCE::standardise_ctd(HCL, force_standardise = T)
colnames(HCL2$level_2$specificity_quantiles)[duplicated(colnames(HCL2$level_2$specificity_quantiles))]
> character(0)
So something is happening further downstream of this step.
Ok, I think i pinpointed the reason.
At level 2 the CTD contains the cell types "Fetal_Neuron" and "Fetal_neuron". I think this is simply an inconsistency with how the original HCL authors annotated their cell types (I've noticed this a lot in that dataset). You can see this by reading in the gene covariate file referenced in the error message.
gcf <- data.table::fread("/var/folders/hd/jm8lzp7s4dl_wlkykzhz66x80000gn/T//RtmpWR0d37/file1c2f6515b4df")
cols <- grep("fetal_neuron",names(gcf), ignore.case = TRUE, value = TRUE)
cols
> "Fetal_neuron" "Fetal_Neuron"
gcf[,cols,with=FALSE]
Fetal_neuron Fetal_Neuron
<int> <int>
1: 19 13
2: 19 26
3: 6 11
4: 0 0
5: 4 0
---
17956: 0 0
17957: 27 0
17958: 0 0
17959: 0 0
17960: 35 9
R doesn't recognize these as duplicates, but internally MAGMA must be ignoring case so it does recognize them as duplicates and thus throws the error. Specifically at this step: https://github.com/neurogenomics/MAGMA_Celltyping/blob/0941d8c2a3b652112f21083e474fe2d56e4f9021/R/calculate_celltype_associations.r#L114
I could add a step to drop dup columns when ignoring case, but the real solution is to regenerate the CTD after correcting the cell type annotations, because this will alter the expression and specificity scores.
I've made some updates in MAGMA.Celltyping 2.0.14 (now pushed to GH), so that it automatically drops duplicate celltypes, but gives users more informative messages about why they're being dropped and which ones. It also recommends to them to reprocess the CTD accordingly.
Checklist
Affected version
2.0.13 I'm guessing @Al-Murphy is using the latest version.
Steps to reproduce the bug
Actual behavior
Expected behavior
Returns enrichment results.
Session info