Closed lyghter closed 5 years ago
I do not know the task you want to solve. But, the num_feat parameter is not the number of feature in the original input matrix X. It is the number of selected features (basically, this can be 50 - 200).
That is
hsic_lasso.regression( discrete_x=True, num_feat=200, B=30, M=1, n_jobs=1 )
Also, if you use the regression function for the classification problem, the performance is not that great. If this is the classification problem, I would suggest using the hsic_lasso.classification function. I
Thanks for reply. I expected:
len(hsic_lasso.get_features()) == num_feat
But in my case it's false
Got it. Actually, the algorithm can return less number of features if the algorithm satisfies some stopping criteria. So, this is the natural behavior of the algorithm.
To handle the case, perhaps, it is good to add L2 regularization to the algorithm. But, we have not implemented that yet.
Accuracy decreased.
dataset: https://www.kaggle.com/artyomsalnikov/dataset-3 code: https://yadi.sk/d/xAsaL-TPGZe09A