Closed B-Seif closed 2 years ago
Have you looked at mnist_reconstruction.py
? https://github.com/lasso-net/lassonet/blob/master/examples/mnist_reconstruction.py
it corresponds to what I'm looking for but unfortunately I can't get the results since I have this error.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [374], in <cell line: 1>()
----> 1 path = model.path(X_train, X_train)
File ~/anaconda3/envs/ieee_vit/lib/python3.9/site-packages/lassonet/interfaces.py:468, in BaseLassoNet.path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
465 is_dense = False
466 if current_lambda / lambda_start < 2:
467 warnings.warn(
--> 468 f"lambda_start={self.lambda_start:.3f} "
469 "might be too large.\n"
470 f"Features start to disappear at {current_lambda=:.3f}."
471 )
473 hist.append(last)
474 if callback is not None:
ValueError: Unknown format code 'f' for object of type 'str'
Do you have an idea ? I'm using Python 3.9.12 and the code that generated the error is:
model = LassoNetRegressor(M=30, n_iters=(300,500), path_multiplier=1.05)
path = model.path(X_train, X_train)
Thanks
My last fix for #18 was not working properly. Can you try again after updating lassonet to the latest version?
How can I get the latest version of lassonet? maybe by using pip install lassonet
? I did that and the error persists.
Do you have an idea to overcome this problem?
pip install -U lassonet
U
is for upgrade :)
I still have the same error :(
Can you paste the error? I literally changed the code https://github.com/lasso-net/lassonet/commit/d76a32641421e01952529dead88836dfd5ae58bd
ok I get this error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [419], in <cell line: 2>()
1 model = LassoNetRegressor(M=30, n_iters=(300,500), path_multiplier=1.05)
----> 2 path = model.path(X_train, X_train)
File ~/anaconda3/envs/ieee_vit/lib/python3.9/site-packages/lassonet/interfaces.py:468, in BaseLassoNet.path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
465 is_dense = False
466 if current_lambda / lambda_start < 2:
467 warnings.warn(
--> 468 f"lambda_start={self.lambda_start:.3f} "
469 "might be too large.\n"
470 f"Features start to disappear at {current_lambda=:.3f}."
471 )
473 hist.append(last)
474 if callback is not None:
ValueError: Unknown format code 'f' for object of type 'str'
You clearly don't have the last version. Maybe uninstall lassonet and reinstall?
Maybe pip install "lassonet>=0.0.12"
?
I have the last version 0.0.12
Maybe you have a problem with your envs. I checked and the latest version looks like this
For the record, here is a minimal example for auto encoders:
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from lassonet import LassoNetRegressor
X, _ = fetch_california_housing(return_X_y=True)
X = StandardScaler().fit_transform(X)
model = LassoNetRegressor(verbose=2)
path = model.path(X, X)
Hi, it works for me. I have another question regarding the use of Lassonet in supervised paradigm : could we use or adapt this algorithm in. the multilabel setting ?
Sure. You would need to inherit the base lassonet class and change the loss function as well as some casting functions.
Look at interfaces.py
and how we implemented classifiers.
I would be happy to review a PR implementing Multilabel classification.
Hi, I run Lassonet to reconstruct some relatively large data (17000 x 8000). it runs for hours and after that the my terminal shows me that the process was killed. I also often get this warring message :
.local/lib/python3.9/site-packages/lassonet/interfaces.py:467: UserWarning: lambda_start=429496.730 (selected automatically) might be too large.
Features start to disappear at current_lambda=429496.730.
warnings.warn(
Killed
do you have an idea to explain this? is there a hyper param that has an influence on this ? Below is my code:
M = np.random.uniform(0,10_000)
path_multiplier= np.random.uniform(1.01,1.5)
hidden1 = np.random.randint(10,100)
hidden2 = np.random.randint(100,200)
start_time=time.time()
model = LassoNetRegressor(M=M,path_multiplier = path_multiplier,hidden_dims=(hidden1,hidden2))
path = model.path(X_train, X_train)
tmp = time.time()-start_time
You should normalize your features first.
The data is already normalized !
Can I share with you the code and data to see ?
Sure, my email is on my personal page!
Ok, thanks
import pandas as pd
from lassonet import LassoNetRegressor
from lassonet import plot_path
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# read data
X_train = pd.read_csv("eurlex-ev-fold1-train.arff.csv").iloc[:, : 8993 - 3993].values
X_train = StandardScaler().fit_transform(X_train)
# reconstruction
model = LassoNetRegressor(
lambda_start=5e-1,
path_multiplier=1.05,
n_iters=(20, 5),
verbose=2,
)
path = model.path(X_train, X_train)
plot_path(model, path, X_train, X_train)
plt.show()
I ran lassonet successfully using those parameters. I plotted the path of the training dataset so the performance is best with all the features.
In [12]: [(p.selected.sum().item(), p.val_loss)for p in path]
Out[12]:
[(5000, 0.7403443455696106),
(5000, 0.7388657331466675),
(5000, 0.7372983694076538),
(5000, 0.7356559634208679),
(5000, 0.7339461445808411),
(5000, 0.732173502445221),
(5000, 0.7303400635719299),
(5000, 0.728447437286377),
(5000, 0.7264959216117859),
(5000, 0.724486231803894),
(5000, 0.7224189639091492),
(5000, 0.7202948331832886),
(5000, 0.7181147336959839),
(5000, 0.715880274772644),
(5000, 0.7135931849479675),
(5000, 0.7112559676170349),
(5000, 0.7088716626167297),
(5000, 0.7064440846443176),
(5000, 0.7039777636528015),
(5000, 0.7014783620834351),
(5000, 0.6989524960517883),
(5000, 0.6964077949523926),
(5000, 0.6938537359237671),
(5000, 0.6913006901741028),
(5000, 0.6887612342834473),
(5000, 0.6862493753433228),
(5000, 0.6837817430496216),
(5000, 0.6813769936561584),
(5000, 0.6790563464164734),
(5000, 0.6768444776535034),
(5000, 0.6747689843177795),
(5000, 0.6728613376617432),
(5000, 0.6711570024490356),
(5000, 0.6696965098381042),
(5000, 0.6685250997543335),
(5000, 0.6676939725875854),
(5000, 0.6672609448432922),
(5000, 0.6672911047935486),
(5000, 0.6678575277328491),
(5000, 0.6690409779548645),
(5000, 0.6709340214729309),
(5000, 0.6736408472061157),
(5000, 0.6772762537002563),
(5000, 0.6819731593132019),
(5000, 0.687872588634491),
(5000, 0.6951332092285156),
(5000, 0.7039183974266052),
(5000, 0.714412271976471),
(5000, 0.7267968654632568),
(5000, 0.7412514090538025),
(5000, 0.7578516602516174),
(5000, 0.7759532332420349),
(5000, 0.7900420427322388),
(5000, 0.7914730906486511),
(5000, 0.7910680770874023),
(5000, 0.7905250787734985),
(5000, 0.7899476289749146),
(5000, 0.7894152402877808),
(5000, 0.7889501452445984),
(5000, 0.7885493636131287),
(5000, 0.7882020473480225),
(5000, 0.7878993153572083),
(5000, 0.7876302003860474),
(5000, 0.7873923182487488),
(5000, 0.7871778607368469),
(5000, 0.7869775295257568),
(5000, 0.7867907285690308),
(5000, 0.7866113185882568),
(5000, 0.7864405512809753),
(5000, 0.7862793803215027),
(5000, 0.7861262559890747),
(5000, 0.785978376865387),
(5000, 0.7858325839042664),
(5000, 0.785689651966095),
(5000, 0.7855460047721863),
(5000, 0.7854057550430298),
(5000, 0.7852645516395569),
(5000, 0.7851291298866272),
(5000, 0.7849962115287781),
(5000, 0.7848662734031677),
(5000, 0.7847418189048767),
(5000, 0.7846243381500244),
(5000, 0.7845121622085571),
(5000, 0.7844087481498718),
(5000, 0.7843157052993774),
(5000, 0.7842352986335754),
(5000, 0.7841715812683105),
(5000, 0.7841252684593201),
(5000, 0.7840957045555115),
(5000, 0.7840867042541504),
(5000, 0.7841042876243591),
(4999, 0.7841560244560242),
(4998, 0.7842445373535156),
(4988, 0.7843688130378723),
(4967, 0.7845369577407837),
(4916, 0.7847509384155273),
(4835, 0.7850145101547241),
(4691, 0.7853205800056458),
(4458, 0.7856589555740356),
(4089, 0.7860158681869507),
(3566, 0.7863755822181702),
(2730, 0.7866988182067871),
(1757, 0.7869724631309509),
(817, 0.7871779203414919),
(266, 0.7872892022132872),
(53, 0.7873258590698242),
(11, 0.7873356938362122),
(1, 0.7873363494873047),
(0, 0.7873362302780151)]
Maybe you could run the code above but run the plot_path
on a different fold. This way you will know if the model was simply overfitting. The validation loss (computed on a random subset of the data) indicates that the performance is poor.
Hi, I would like to use lassonet as an unsupervised feature selection algorithm, but I can't find an example that shows how to do this in a simple way. The only script that shows an example rebuild is the
minst_ae.py
but it doesn't work ( I have an error :LassoNetAutoEncoder
does not exist ! .My use case: I have an input matrix without labels, and I want to have a new reduced matrix with only 30% of the important features.