grst / single_cell_data_integration

1 stars 0 forks source link

cleaning of individual datasets. #5

Closed grst closed 5 years ago

grst commented 6 years ago

Datasets need to be cleaned and normalized before scanorama integration (#2). I identified the following steps following these tutorials:

before merging

after merging

grst commented 6 years ago

@Hoohm, any other steps you would consider?

Hoohm commented 6 years ago

I would not recommend imputation as it is always predicated upon the quality of the clustering and rarely help much.

For first steps that seems fine to me. What do you want to do after that?

grst commented 6 years ago

next step would be to feed everything into scanorama to remove batch effects.

Hoohm commented 6 years ago

Forgot about defining a method (or methods) to compare clustering "quality". From the top of my head I know about Silhouette plots

mlist commented 6 years ago

Silhouette value is a good start. If you know the cell type of individual cells you can use the ontology score we proposed

https://dx.doi.org/10.1093%2Fbioinformatics%2Fbty553

Might not work so well with cancer cells though.Also see other methods referenced in the paper, in particular kbet from the Theiss group

https://doi.org/10.1101/200345

Best, Markus

Am Mi., 31. Okt. 2018, 22:44 hat Patrick Roelli notifications@github.com geschrieben:

Forgot about defining a method (or methods) to compare clustering "quality". From the top of my head I know about Silhouette plots

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grst/single_cell_data_integration/issues/5#issuecomment-434858480, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVg3eJuyTUVbfMAXldyEk18H2cs36JEks5uqhmXgaJpZM4X9bX0 .

grst commented 6 years ago

Split this up in before merge/after merge (see https://github.com/grst/single_cell_data_integration/issues/3#issuecomment-439397642)