ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
311 stars 54 forks source link

Add simple T/BCR doublet detector, use markdown for roxygen, and slight refactoring of combineExpression #417

Closed Qile0317 closed 4 weeks ago

Qile0317 commented 1 month ago

This partially resolves #388. See NEWS.md for the changes added. Most importantly please let me know whether the doublet detection change as of right now makes any sense. (see the R/getContigDoublets.R file) If so I'll proceed to add additional features related to doublet detection.

The vast majority of the files changed are from the automatic conversion of regular roxygen to roxygenmd. If you don't mind this change in convention (which I personally prefer a lot more over regular roxygen while being compatible) you can pretty much ignore the vast majority of the files here as I don't believe the actual Rd documents changed much if at all. I may be wrong but I don't think so.

Qile0317 commented 1 month ago

@ncborcherding also configured PRs to dev to run R-CMD-check

Qile0317 commented 1 month ago

@ncborcherding I see that you've marked this ready for review - just wanted to reinforce that I'm actually not completely sure the code is actually fully correct and will be adding more tests and manually verifying things and I'll ping again when I'm confident.

ncborcherding commented 1 month ago

Yep - that is why I am holding for now. I had to mark it to be able to see some testing.

Qile0317 commented 1 month ago

@ncborcherding Ok, so after locally testing on a real life dataset with TCRs and BCRs (let me know if you want the notebook) I am pretty confident with the doublet detector (afterall the core idea is to just filter for rows where the barcode is in the intersection of TCR and BCR lists) but I had slight trouble coming up with what exactly to do after in a way that is generalizable to a function.

I have two ideas to make this something practical for users:

  1. add a vignette showcasing ways to look at whether these doublets are of concern (i.e. if they are confounded with certain variables in the metadata by using glm(), overlaying them on the UMAP, how to remove them, etc.)
  2. Some function to do pairwise glm regression of other variables in the metadata all at once in some aggregate view to look at potential confounding.

Please let me know what you think. And also lemme know if the output of getContigDoublets is awkward - I'm afraid some users may feel so.

Qile0317 commented 1 month ago

@ncborcherding if you want to leave the previous comment potentially for another future release/PR and dont see a need for changes - this is ready to merge now.

The function is marked with the "experimental" badge so that we can feel better about making larger changes on it or just keeping it as an internal function / removing it, etc.