wvictor14 / planet

For inferring ancestry, gestational age, and cell type proportions from placental DNA methylation array data
3 stars 1 forks source link

Ethnicity labels #21

Open wvictor14 opened 11 months ago

wvictor14 commented 11 months ago
          One more thing - it was discussed at lab meeting that instead of `Ambiguous`, it should say `Other` in the labels that `predictEthnicity` outputs, since the tool can only calculate 3 ancestries but there are other ancestries out there (+ ancestry is a continuum) so in reality samples being called ambiguous may just be mixed or from an ancestry other than African/Asian/European. Let me know what you think and if you agree I'm happy to change that myself too!

Originally posted by @iciarfernandez in https://github.com/wvictor14/planet/issues/19#issuecomment-1671927657

wvictor14 commented 11 months ago

Hey Iciar, so the "amibguous" class is for samples with uncertain predictions, where "uncertain" is defined at some probability cutoff (75% as default). In my paper I show these samples below this threshold to correlate well with mixed genetic ancestry of the three reference ancestries. Because of this, I think "ambiguous" would be better changed to something like "mixed" .

The ethnicity predictor has no way of telling if the queried data is not any of the 3 ancestries used in training data, so I think calling it "other" would be too assumptive and sometimes just wrong.

iciarfernandez commented 11 months ago

That makes sense to me! I think "mixed" is an improvement from "ambiguous" anyway.