ludvigla / semla

Other
47 stars 6 forks source link

SubsetSTData and MergeSTData #18

Closed EddieLv closed 2 months ago

EddieLv commented 8 months ago

In STUtility, both functions did not change the barcode names, but now in semla, it will automatically add -[sampleID] after the barcode names, which is not necessary! And when I match two objects, it may confuse! ---Example--- sratA: 2 samples sratB: 5 samples sratC = MergeSTData(sratA, sratB) # 7samples

sratC$celltypeC = NA sratC$celltypeC[colnames(sratA)]= as.character(sratA$celltypeA) # works, because barcode-sampleID matched sratC$celltypeC[colnames(sratB)]= as.character(sratA$celltypeB) # does not work!!! because barcode of sratB is changed after MergeSTData!!!

Wish to have a solution:)

lfranzen commented 7 months ago

Hi EddieLv and thank you for your question,

The way it works now is the way it has to work in order to not create a conflict in the spot IDs when merging several objects. In your sratA object, the appended sample IDs will be "-1" and "-2", and since that's the first object you specify when merging, those IDs will stay. However, for your sratB object, you originally have sample IDs 1-5, but when merging there'd be a conflict for IDs 1-2 if you keep the original ones since they overlap with sratA.

What you try to achieve can be done in other ways as well, and I think that it'd be more robust to achieve it in a way where you don't rely on row names of other objects.

In this case, both the celltypeA and celltypeB columns will follow into the meta data of you new merged object sratC, and for the rows where you don't have any info (since it comes from the other original data set) it will fill up with NA. Therefore, one simple way to do it would be to run:

sratC = MergeSTData(sratA, sratB)  # Merge objects
sratC$celltypeC = NA  # Create new empty column

# fill up with either celltypeA or celltypeB using ifelse()
sratC$celltypeC <- ifelse(test = !is.na(sratC$celltypeA), yes = sratC$celltypeA, no = sratC$celltypeB)  

Hope that answers your questions and can help you progress with your analyses.