scikit-learn / enhancement_proposals

Enhancement proposals for scikit-learn: structured discussions and rational for large additions and modifications
https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
47 stars 34 forks source link

SLEP011: Add Logical Analysis of Data (LAD) classifier #23

Closed GregoryMorse closed 4 years ago

GregoryMorse commented 4 years ago

See the SLEP for details.

GaelVaroquaux commented 4 years ago

I do not think that adding a classifier calls for a SLEP: SLEPs are for modifications that touch many classifiers.

It should rather be discussed in an issue in the scikit-learn repo.

However, I am not sure that the proposed algorithm meet the criteria for inclusion (listed on https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms), indeed, a Google search shows only a few dozen citations: https://scholar.google.ca/scholar?q=Accelerated+algorithm+for+pattern+detection+in+logical+analysis+of+data&hl=en&as_sdt=0&as_vis=1&oi=scholart

chkoar commented 4 years ago

A scikit-learn-contrib package maybe.

GregoryMorse commented 4 years ago

Okay there is definetely good information here. But this algorithm might be worth it. Take for example a more influential paper. That paper I cited is particularly useful for finding patterns efficiently but for the overall process: https://scholar.google.ca/scholar?hl=en&as_sdt=0%2C5&as_vis=1&q=An+implementation+of+logical+analysis+of+data&btnG=

An implementation of logical analysis of data E Boros, PL Hammer, T Ibaraki, A Kogan Cited by 430

At least I should focus on making a supporting argument based on the right paper, along with other requirements from the FAQ. I was also unsure if a SLEP was the right place for the discussion. For now probably a contrib package is the right direction and it can be considered for the main library thereafter. Thanks to both of you for your guidance on this.

GregoryMorse commented 4 years ago

The only real question is wide use after studying the FAQ. But I would consider that its wide use is partly because public open source implementations are not available. Research demonstrates very good results with it.

jnothman commented 4 years ago

Which sounds like a good reason for it to be implemented in contrib initially

adrinjalali commented 4 years ago

Also, it may be easier to have it in sklearn-extra instead of a stand-alone package.