rouyang2017 / SISSO

A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Apache License 2.0
245 stars 80 forks source link

How does SIS choose features in Sid #43

Closed IsaacDiane closed 2 years ago

IsaacDiane commented 3 years ago

Hello,Dr Ouyang! I use SISSO and try to find the top1000 features correlated to the target. However,there are different results in the following two situations: 1)I set parameters in SISSO.in like this: subs_sis=1000 dim=1 then I get top1000 features in feature_space/space_001d.name and feature_space/space_001d_p001.dat 2)I set parameters in SISSO.in like this: subs_sis=400 dim=3 then I get 3*400 features, in feature_space/space_001(002,003)d.name and feature_space/space_001(002,003)d_p001.dat BUT! The 1st feature in feature_space/space_002d.name in case2 is not the 401st one in feature_space/space_001d.name in case1. What's happened when SIS runs in different epoches?

rouyang2017 commented 3 years ago

Hi, as shown in the SISSO paper, at each dimension a different subspace is selected. The first subspace has features with highest correlation with your original property Y. The second subspace has features with highest correlation with the 1st residual vector, and so on. Thus, the 401st feature in the second subspace of your case2 will be different from the 401st in the first subspace of case1.