Using metrics.py - Githubissues

mkkkkk5 commented 3 years ago

Greetings,

I would like to use a fragment of your code on metrics.py.

I have the X and y datasets saved on my local space manually and to use it on my code, I use a panda.read_csv(). I wanted to use the quality measures functions and I was wondering if I could directly use the X.csv and y.csv from this link: https://mespadoto.github.io/proj-quant-eval/post/datasets/ I did not use the get_datasets.py to fetch them from the given link.

When i simply tried this:

bank_X = df= pd.read_csv('datasets/bank/X.csv')
bank_y = df= pd.read_csv('datasets/bank/y.csv')

test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y)

It gave me this error:

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:179: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return self._fit(X, y) Traceback (most recent call last): File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 203, in test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y) File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 137, in metric_dc_neighborhood_hit_k_03 return metric_neighborhood_hit(X, y, 3) File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 53, in metric_neighborhood_hit return np.mean(np.mean((y[neighbors] == np.tile(y.reshape((-1, 1)), k)).astype('uint8'), axis=1)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index([ (0, 1221, 1745), (1, 1492, 642), (2, 139, 706),\n (3, 2057, 130), (4, 1219, 1470), (5, 1419, 20),\n (6, 1986, 1758), (7, 42, 1325), (8, 932, 328),\n (9, 759, 523),\n ...\n (2048, 138, 1941), (2049, 1040, 617), (2050, 1262, 1667),\n (2051, 1430, 1600), (2052, 1125, 2013), (2053, 712, 724),\n (2054, 1636, 1624), (2055, 1412, 546), (2056, 749, 169),\n (2057, 1817, 3)],\n dtype='object', length=2058)] are in the [columns]" /usr/local/bin/python3 /Users/--/Desktop/Hilbert_Projection_Copy/from_git.py (base) --@---MacBook-Pro Hilbert_Projection_Copy % /usr/local/bin/python3 /Users/--/Desktop/Hilbert_Projection_Copy/from_git.py /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:179: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel(). return self._fit(X, y) Traceback (most recent call last): File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 203, in test = metric_dc_neighborhood_hit_k_03(bank_X,bank_y) File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 137, in metric_dc_neighborhood_hit_k_03 return metric_neighborhood_hit(X, y, 3) File "/Users/--/Desktop/Hilbert_Projection_Copy/from_git.py", line 53, in metric_neighborhood_hit return np.mean(np.mean((y[neighbors] == np.tile(y.reshape((-1, 1)), k)).astype('uint8'), axis=1)) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in getitem indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1] File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index([ (0, 1221, 1745), (1, 1492, 642), (2, 139, 706),\n (3, 2057, 130), (4, 1219, 1470), (5, 1419, 20),\n (6, 1986, 1758), (7, 42, 1325), (8, 932, 328),\n (9, 759, 523),\n ...\n (2048, 138, 1941), (2049, 1040, 617), (2050, 1262, 1667),\n (2051, 1430, 1600), (2052, 1125, 2013), (2053, 712, 724),\n (2054, 1636, 1624), (2055, 1412, 546), (2056, 749, 169),\n (2057, 1817, 3)],\n dtype='object', length=2058)] are in the [columns]"

I would like to know how I can validate my dataset so I am able to use it against your functions.

I would also like to know what id_run variable is.

Thank you in advance.

mespadoto commented 3 years ago

Have you tried doing what the first message says? Usually it does the trick:

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().

mkkkkk5 commented 3 years ago

Thank you for your advice.

I was also wondering how what id_run is and how and how I can produce this parameter for the functions that need this. I am only using the quality metrics function and for that I am just using a simple:

bank_X = df= pd.read_csv('datasets/bank/X.csv')
bank_y = df= pd.read_csv('datasets/bank/y.csv')

to get my datasets.

mespadoto commented 3 years ago

These scripts are meant to reproduce the experiment in the survey paper, and the id_run is part of it. If you want more general implementations of those metrics, you can find them here.

mespadoto / proj-quant-eval

Using metrics.py #2