scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 122 forks source link

Export to sympy #119

Closed jcrudy closed 7 years ago

jcrudy commented 8 years ago

Add functionality to export Earth models as sympy expressions. This will allow the use of codegen to generate C, fortran, or matlab code from the sympy expressions. Additional languages (including pmml? See issue #28 ) could be added by extending codegen.

Open questions:

  1. How mature is codegen functionality in sympy? It looks like it's only in the development version right now.
  2. How hard is it to add new languages to codegen?
mikkookkim commented 8 years ago

Enhancement to the export functionality would be great to make py-earth easier to use with custom applications requiring Earth regression modeling, and where it's impossible or too difficult to dynamically create and use the model.

I'm using Py-Earth to create models from certain data sets and exporting them and then manually converting the lambda functions to Javascript/NodeJS. It's not really possible for me to use Python code in my application because it will be running also on Ipad. Javascript works there well but it seems Python is not really available, at least in a way that would make sense for us.

Converting the Python lambda syntax to JavaScript is fairly difficult manual work, especially the more terms and degrees there are in the model. You'll have to break the lambda IF clauses into units that can be written in general IF syntax and you'll have to pay attention to parentheses and co-efficients (might include also variables).

I've been doing this and even wrote some kind of Perl code to try to automate this but it's not working very well when the Earth modeling options are changed and the model gets different structure. I'll still need to do lots of manual work. For example now I have a model that has over 30 lambda functions to convert, most of them being complex ones with parts of the lambda function being multiplied by other parts.

I think the lambda functions (with IF clauses) are kind of unique for Python (sorry if I'm mistaken, I'm not a real software engineer) and they cannot very easily be used in other languages. If you exported the model in a simple and general syntax, it would be very easy to convert the models into any programming language. I mean that the exported model would be a set of normal IF clauses and storing the results into variable and adding them together. That would be very simple to use.

If the Sympy / Codegen could accomplish this, then that would be great!

Another solution would be if there were some existing tool that can convert the Python Lambda function syntax to a more general syntax IF syntax as it would be easy to convert manually, by nature. I have tried couple of tools for converting Python to JavaScript but in practice they haven't understood the Lambda functions and haven't worked.

I think your Py-Earth is a great tool and I would think there should be lots of use cases for it, also in other than the Python language.

jcrudy commented 8 years ago

@mattlewissf is working on this. Will assign to him if I can figure out how.

jcrudy commented 7 years ago

This issue is addressed by PR #129 from @mattlewissf. @mikkookkim, if you still need this functionality you should be able to use sympy's printing system to get the sympy expressions into javascript or whatever other form you want (with some amount of work depending on the target language). You can look at the new example and unit test to see some demonstrations of how to do it (although not javascript specific). Please feel free to open new issues if you have any problems or confusion.

mikkookkim commented 7 years ago

Hi Jason,

Thanks for you reply. It took me a while to check out the new export functionality that uses Sympy. I think that form of exported model is much easier for me to convert into Javascript, so that's a good thing!

I was doing another Earth model out of a data set last week and noticed that with the identical dataset and Earth parameters, I was getting different modeled results on my two different computers. Both being Ubuntu 14.04. I'm trying to figure out how to see which version of Py-Earth they have but I can't seem to find any version number information in the files.. How should I report to you which Py-Earth version I have and do you need other version information?

I want to make Earth models that should be as close fit as possible and I'm seeing now a difference of few percentages of amount of the error in the modeled value at the known values. This is quite significant to my use case and of course it would be great if there was a version that does both: accurate modeling and contains the SYMPY export functionality.

The Py-Earth version that makes better model at least with this database and parameters seems to be the older one that does not have the Sympy utilizing export functionality while the other machine has it.

I'm attaching you a compressed file that has two set of files: from my laptop (better modeled values - no Sympy export) and from my virtual Ubuntu (worse modeled values but has Sympy export).

There are four files:

I can submit an issue through your GIT page but I thought to ask from you first, especially about how to check the Py-earth version that I have.

Br,

Mikko

On Tue, Aug 23, 2016 at 10:27 AM, Jason Rudy notifications@github.com wrote:

This issue is addressed by PR #129 https://github.com/scikit-learn-contrib/py-earth/pull/129 from @mattlewissf https://github.com/mattlewissf. @mikkookkim https://github.com/mikkookkim, if you still need this functionality you should be able to use sympy's printing system to get the sympy expressions into javascript or whatever other form you want (with some amount of work depending on the target language). You can look at the new example and unit test to see some demonstrations of how to do it (although not javascript specific). Please feel free to open new issues if you have any problems or confusion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/py-earth/issues/119#issuecomment-241649647, or mute the thread https://github.com/notifications/unsubscribe-auth/AMNXFGSImNC7nm4PguVpEWpezd8hYYMUks5qiqD7gaJpZM4I7-Wu .

jcrudy commented 7 years ago

@mikkookkim The attachments didn't come through. Possibly github removed them before forwarding your message? You could try emailing me directly. My address is on my github profile.

A few thoughts:

  1. Py-earth depends on BLAS, and I have seen differences in performance among different BLAS implementations. It's possible your two environments are not exactly the same in this respect. It would be better if you could reproduce the problem on a single machine (maybe using Conda environments) to be sure it is a change in py-earth and not a difference in environments that is the problem.
  2. Since py-earth hasn't had a release yet, there is no version number. Instead, there is git, which keeps track of all changes in the code. Each commit has a hash which can be used to identify it. If you installed py-earth by cloning a repository, you can use git rev-parse HEAD from within that repository to see the hash for it's current state. If you post these in addition to your files it may help me figure out what's going on.
  3. The only recent change I can think of that might affect prediction performance is a recent bug fix. See issue #135. In my tests the fix improved performance on the training set. However, it's possible it's causing overfitting in your case or some other unexpected behavior.