Closed Yunuuuu closed 1 year ago
Thanks a lot for reporting this, yes they should be the same. Fortunately the error was only in the reporting, and shouldn't have affected the doublet scores.
It error should be fixed now on the github version (would be happy if you could confirm with your dataset), and I'll push it to Bioc devel once the checks have passed.
Hi @Yunuuuu , could you confirm that this solved your problem? Will close the issue if there's no answer. Pierre-Luc
Hi, I downloaded the latest plger/scDblFinder using pak::pkg_install and restart R, it remains here:
I checked the source code of scDblFindeer
function, which indicates this has been modified:
I try to understand the code, but I'm not familiar with the internal function:
when samples
is not NULL and returnType
is "sce" or "full", following code won't run in scDblFinder
funtion:
if (returnType == "counts") {
for (s in names(d)) d[[s]]$sample <- s
return(do.call(cbind, d))
}
You're absolutely right, I did this too quickly... should hopefully be fixed for real in the latest push :)
@Yunuuuu , hopefully everything is as expected now?
I'll try this again @plger
It remains here:
the package GithubSHA1 is here:
Thanks for the development of this package @plger, I'll do more test this weekend, I cannot find what's wrong now
Hi @Yunuuuu , okay now I don't get why you're having this problem, as I can't reproduce it with my toy data. Could you share a minimal example, e.g. SCE with only count matrix and sample id, only 2-300 genes, perhaps subsampling the cells? (you can rename genes & remove other cell metadata if you're worried about the data)
Is there any method to share rds data ?
You can email it to pierre-luc.germain@hest.ethz.ch
if it's <20mb, otherwise if you don't have a platform for sharing of larger files you can write me an email and I'll send you some details.
Thanks!
hi, I have uploaded it to the Google Drive,and the link has been emailed to pierre-luc.germain@hest.ethz.ch
. I can confirm this data can induce the problem. Thanks!
[R]> set.seed(221113L)
[R]> anyDuplicated(colnames(test_data))
[1] 3466
[R]> sce_qc <- scDblFinder::scDblFinder(
test_data,
clusters = TRUE, dims = 50L,
nfeatures = 2000L,
samples = "Sample",
multiSampleMode = "split",
returnType = "sce"
)
There were 26 warnings (use warnings() to see them)
[R]> data.frame(colData(sce_qc)) %>%
dplyr::select(Sample, scDblFinder.sample
) %>%
dplyr::filter(Sample != scDblFinder.samp
le) %>%
head()
Sample scDblFinder.sample
TTTCCTCTCAACTCTT-1 sample3 sample2
GTCAAACTCCACGAAT-1 sample3 sample1
GGTTAACCAGCGCTTG-1 sample3 sample2
AGCATCATCGGCTTGG-1.1 sample3 sample1
TGGAACTGTGACAGCA-1.1 sample3 sample1
It seems the column cell names matters, for I have some duplicated column names ? By changing colnames with colnames(test_data) <- paste0("cell_", seq_len(ncol(test_data)))
, this problem can be figured out.
[R]> colnames(test_data) <- paste0("cell_", seq_l
en(ncol(test_data)))
[R]> anyDuplicated(colnames(test_data))
[1] 0
[R]> set.seed(221113L)
[R]> sce_qc <- scDblFinder::scDblFinder(
test_data,
clusters = TRUE, dims = 50L,
nfeatures = 2000L,
samples = "Sample",
multiSampleMode = "split",
returnType = "sce"
)
There were 28 warnings (use warnings() to see them)
[R]> # logNormCounts
data.frame(colData(sce_qc)) %>%
dplyr::select(Sample, scDblFinder.sample
) %>%
dplyr::filter(Sample != scDblFinder.samp
le) %>%
head()
[1] Sample scDblFinder.sample
<0 rows> (or 0-length row.names)
Ok, thanks @Yunuuuu , that explains a lot. I'm afraid I'm going to have to throw an error msg on duplicated colnames, because I need to match the cells with the original object (to provide the full original object with added slots).
@plger Thanks a lot, enforcing unique colnames have already solved this.
Hi, I have run scDblFinder in "split" smaple mode to detect doublets with following code (since the data is large, I only provide code):
When I check the results, the
scDblFinder.sample
column seems strange:I don't know why they are different when I used a "split" mode? From the help page of
scDblFinder
, "split" mode runs all process separated by samples, I think they should be the same, is it right?