ppsp-team / PyNM

Lightweight Python implementation of Normative Modelling
BSD 3-Clause "New" or "Revised" License
32 stars 12 forks source link

How to deal with categorical variables with Gaussian process regression? #37

Closed ruiyangge closed 1 year ago

ruiyangge commented 1 year ago

Hi there, I'm curious about how you handle categorical variables, such as 'site,' when using Gaussian process regression. Do you employ a specific method tailored for GPR in this context? I'm asking because conventional GPR assumes continuous input variables. Thanks.

deep-introspection commented 1 year ago

Hi! You can still use a one-hot encoding scheme. See this paper for more details: https://arxiv.org/abs/1805.03463 or also this scikit-learn tutorial: https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_on_structured_data.html