trevorstephens / gplearn

Genetic Programming in Python, with a scikit-learn inspired API
http://gplearn.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.56k stars 274 forks source link

Add Analytic Quotient and another protected log #242

Open jmmcd opened 2 years ago

jmmcd commented 2 years ago

gplearn already has a protected division: np.where(np.abs(x2) > 0.001, np.divide(x1, x2), 1.) and protected log: np.where(np.abs(x1) > 0.001, np.log(np.abs(x1)), 0.).

Both of these are common but both introduce discontinuities which can be undesirable (see eg Keijzer; Ni, Drieberg and Rockett; Nicolau and Agapitos). Alternatives include:

Of course these would be of interest as alternatives, not replacements. If this addition would be useful I can generate a PR. Would these names be suitable:

trevorstephens commented 2 years ago

Can you provide a link to the paper(s) please?

jmmcd commented 2 years ago

Short summary: they don't like discontinuities.

trevorstephens commented 2 years ago

Thanks! 😄 Hope I'll have a chance to flick through these soon. If it's related to work you wish to do personally you should be able to use the custom functions to do what you need to but I think it's worth considering putting them in as a first class function set if there's enough research supporting it

hwulfmeyer commented 2 years ago

See #131 and #130

trevorstephens commented 2 years ago

I'm thinking of how to move on this after years of user demand. Initial thoughts are to have a (yes another API bloat addition) parameter where the user can set something like closure_method=('koza'/'smooth') or something like that. And still specify the default log, div in their function sets. But also allow specifying via log-koza or similar that if they want to mix and match.

jmmcd commented 2 years ago

closure_method is slightly ambiguous, as there could be multiple smooth log operators, for example. It does create extra complexity, as the code would have to deal with the closure_method setting being over-ridden by the individual operators. (Plus default values for both.)

I think having default operators and allowing the user to over-ride them one-by-one is fine.

Just my 2 cents.

trevorstephens commented 2 years ago

Yeah could be a bit excessive perhaps. All that parsing would be done up front though so the overhead wouldn't add anything much to processing time, just some more code in the parameter parsing/checking. Was kind of thinking about the printed outputs being clean and readable.

trevorstephens commented 2 years ago

Other ways to address the printed outputs though. But could lead to ambiguity. Are users allowed to use both types of division (for example) in one program? :grimacing:

jmmcd commented 2 years ago

I would be in favour of putting the onus on the user here. If they specify two division operators, give it to them! Would require a lot of new code to prevent that.