hoxo-m / densratio_py

A Python Package for Density Ratio Estimation
https://github.com/hoxo-m/densratio_py
Other
127 stars 30 forks source link

Choosing kernel centers from test data #12

Open krooner opened 3 years ago

krooner commented 3 years ago

Hello. I'm trying to understand density-ratio estimation including RuLSIF for implementing transition detection w.r.t. smart home data. Thank you for making such useful module.

As written in the RuLSIF.py, I read one reference 'A Least-squares Approach to Direct Importance Estimation' about LOOCV to understand how sigma and lambda are determined.

In this reference, it says that it randomly chooses kernel centers from test data "without replacement". But line 48 of RuLSIF.py, centers = x[randint(nx, size=kernel_num)] If we run the code, it chooses elements with replacement so there are duplicated data points.

So, I think the code should be changed into this. from numpy.random import choice centers = x[choice(nx, kernel_num, replace=False)]

Please check whether it is right and give some comments! Thank you.

mierzejk commented 3 years ago

Hi @krooner,

you might be possibly interested in my branch, featuring unique numpy.random.choice (without replacement) that has superseded numpy.random.randint for kernel centers selection. Furthermore, by applying numpy.percentile the choice is stratified with respect to (possibly multivariate) x values. Please refer to the semi_stratified_sample function.

Best regards, Chris