[Feature] Parallel Coordinates plot

rodrigo-arenas commented 3 years ago

Is your feature request related to a problem? Please describe. NA

Describe the solution you'd like Implement in the sklearn_genetic.plots module a function named plot_parallel_coordinates to inspect the results of the learning process

Describe alternatives you've considered The function should take two arguments:

estimator: A fitted estimator from sklearn_genetic.GASearchCV
features: list, default=None. Subset of features to plot, if None it plots all the features by default

The function should return an object to plot parallel coordinates according the pandas.plotting.parallel_coordinates function

The data to plot is available on the estimator.logbook object, look the implementation of the plot_search_space function to see how to convert this data to a pandas data frame

The function must select only the non categorical variables, this can be done by inspecting the estimator.space object and comparing against the data types defined in sklearn_genetic.space, i.e Categorical, Continuous and Integer and color against the "score" column. In the same way, it must validate and make a warning if in the features parameter a Categorial one is passed

Additional context Links of some implementations:

rsvarma95 commented 3 years ago

Can I try this?

rodrigo-arenas commented 3 years ago

@Raul9595 for sure! All the help is welcome

rsvarma95 commented 3 years ago

I just had a few questions.

What would the categorical value be i.e. the 'Name' in the example of https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.parallel_coordinates.html
Do we include score also in the plot?

rodrigo-arenas commented 3 years ago

What I have in mind is the following:

take the "score" column, and for each row, calculate to what quartile it belongs (Q1 - Q4) using pandas.cqut(.., q=4), call this column "score_quartile"
Now, the "Name" parameter will be this new column
We can keep the score column as the last column, as the paramater "features" will allow the user to remove it if they wish

If you have any question, let me know. Thanks!

rodrigo-arenas commented 3 years ago

hi @Raul9595, thanks again for the help! I just realized that pandas doesn't scale each variable independently, making that large scale parameters squeezes small parameters. I was wondering if you also want to work on this enhancement, to make a plot that can have independently scales for each feature?

rsvarma95 commented 3 years ago

Hi @rodrigo-arenas! Yes I can definitely try this out. How do you want to proceed with this -

Normalize everything such that everything shares the same y axis range
Plot them along separate y axis ranges based on their min and max values

rodrigo-arenas commented 3 years ago

Ey, thanks! The second option would be the one to go, its less confusing for the users as the parameters stay in the same scale they defined

rsvarma95 commented 3 years ago

Ok sounds good. Will work on it

rsvarma95 commented 2 years ago

Sorry for taking a long time. It may tough to do the above solution using Pandas. Is Matplotlib or Plotly a option?

rodrigo-arenas commented 2 years ago

Ey, don't worry about it. Matplotlib can be a good fit, so we don't add extra dependencies with Plotly Thanks!

rsvarma95 commented 2 years ago

Hi! I am not getting enough time to work on this. I can take a look at it in the future or you can assign it to someone else. I appreciate all the help and am sorry for not being able to complete it.

Archimedean2 commented 2 years ago

Hi! Is this still up for grabs?

rodrigo-arenas commented 2 years ago

Hi, Yes, the help is welcome on this

rodrigo-arenas commented 2 years ago

Closed as mention in #98

rodrigo-arenas / Sklearn-genetic-opt

[Feature] Parallel Coordinates plot #27