Fundamental changes: create REP LTS

jonas-eschle commented 7 years ago

Hey developers,

there has not been a lot of activity, it looks clearly as if the repo gets abandoned. Which I find quite sad! I think it took quite some time for people to discover your repository (to be honest, reproducibility was not the best reason to convince people to use it. I usually promote REP under: "save 80% of the time you spend coding") and 400+ stars speak for themselves. On the other hand a lot of work is needed for the upkeep of the repo (wrappers...), sure, I understand. But I think times and needs have also changed.

Let me first tell you what I like about REP:

reports! great tool (also the metrics)
ML-lego using folding, bagging etc (it's very nice for people who like to "play around" and not just use default algorithms)
useful gridsearch tools
same API for several libraries like XGBoost etc allows for simple replacement of any classifier without altering the code

This things are quite useful and so far I have not found anything similar.

So let me propose you this:

Convert REP to an "LTS" version

which means, there are some crucial things that can be removed and therefore drastically reduce the time spent on the upkeep of the repo.

Things to do:

get rid of all libraries except of XGBoost, Scikit-Learn, (may add a direct keras wrapper?) and improve their wrapper (for example, XGBoost has a lot of additional parameters I think).
which includes to drop the ROOT bindings. And that's fine. You have helped to show that opensource machine learning does not have to hide in any way from TMVA and is usually the better choice (thanks for that!). Although TMVA has drastically improved its library, I still would not recommend to have any bindings (as they are a vast majority of the upkeep I think). If the ROOT guys really come up with something genius, I would highly suggest you to port this solution, implement it in your own repo and therefore make it available for the whole ML community (think of this: if TMVA has a great tool, no one will just install root to have a great ML algorithm except of already ROOT users who will use TMVA then directly anyway). This would benefit everyone (ML community, you, even TMVA as their ideas get exported to people who would otherwise never use them).
Although the factory is a nice thing, if it generates a lot of bugs or is time-consuming for upkeep, may get rid of it as well.
may add some of the requested features (like stratified-k-folding which is waiting in a branch to be merged, stackingClassifier and so on, see issues). I think those are quite small things in general which would complete REP.
May also include you new tool hyperopt as an option to the GridSearch.
May promote REP differently. It is a very convenient wrapper around other libraries and saves a lot of work.
There are probably more things which require a loot of maintenance effort. Let us know about them and discuss things.

I know that this requires some effort. But I also think this will drastically reduce the maintenance (all libraries have more or less stable APIs, XGBoost, sklearn and keras are in a stage where changes are rare I think) and make REP survivable for the next 5-10 years or so, probably serving a niche of people (although I do not know about similar alternatives). In the current state, I don't think it will last too long, as lack of support (like reactions to issues etc.) lets people abandon the repo. If you are able to reduce the maintenance, I think you will have time for the (still quite rare) issues coming up from time to time.

On the other hand, if you don't like the extra amount of work and decide to abandon the repo, it would be nice to state that in the README, stating reasons and alternatives (-> hyperopt) so that new people do not use the repo as they otherwise would be disappointed. This is a reasonable choice, most of all if you may know that the functionality provided by REP will be obsolete anyway soon (because of similar libraries, or other things to come in the near future...).

Anyway, thanks for the great work you have done!

Cheers, Jonas

anaderi commented 7 years ago

Hey Jonas,

thanks a lot for constructive comments! we’ll consider them thoroughly. BTW, would you be interested in the further development/refactoring of the code?

-- Kindest Regards, Andrey Ustyuzhanin

On 8 Jul 2017, at 14:22, Jonas Eschle wrote:

Hey developers,

there has not been a lot of activity, it looks clearly as if the repo gets abandoned. Which I find quite sad! I think it took quite some time for people to discover your repository (to be honest, reproducibility was not the best reason to convince people to use it. I usually promote REP under: "save 80% of the time you spend coding") and 400+ stars speak for themselves. On the other hand a lot of work is needed for the upkeep of the repo (wrappers...), sure, I understand. But I think times and needs have also changed.

Let me first tell you what I like about REP:

reports! great tool (also the metrics)

ML-lego using folding, bagging etc (it's very nice for people who like to "play around" and not just use default algorithms)

useful gridsearch tools

same API for several libraries like XGBoost etc allows for simple replacement of any classifier without altering the code

This things are quite useful and so far I have not found anything similar.

So let me propose you this:

Convert REP to an "LTS" version

which means, there are some crucial things that can be removed and therefore drastically reduce the time spent on the upkeep of the repo.

Things to do:

get rid of all libraries except of XGBoost, Scikit-Learn, (may add a direct keras wrapper?) and improve their wrapper (for example, XGBoost has a lot of additional parameters I think).

which includes to drop the ROOT bindings. And that's fine. You have helped to show that opensource machine learning does not have to hide in any way from TMVA and is usually the better choice (thanks for that!). Although TMVA has drastically improved its library, I still would not recommend to have any bindings (as they are a vast majority of the upkeep I think). If the ROOT guys really come up with something genius, I would highly suggest you to port this solution, implement it in your own repo and therefore make it available for the whole ML community (think of this: if TMVA has a great tool, no one will just install root to have a great ML algorithm except of already ROOT users who will use TMVA then directly anyway). This would benefit everyone (ML community, you, even TMVA as their ideas get exported to people who would otherwise never use them).

Although the factory is a nice thing, if it generates a lot of bugs or is time-consuming for upkeep, may get rid of it as well.

may add some of the requested features (like stratified-k-folding which is waiting in a branch to be merged, stackingClassifier and so on, see issues). I think those are quite small things in general which would complete REP.

May also include you new tool hyperopt as an option to the GridSearch.

May promote REP differently. It is a very convenient wrapper around other libraries and saves a lot of work.

There are probably more things which require a loot of maintenance effort. Let us know about them and discuss things.

I know that this requires some effort. But I also think this will drastically reduce the maintenance (all libraries have more or less stable APIs, XGBoost, sklearn and keras are in a stage where changes are rare I think) and make REP survivable for the next 5-10 years or so, probably serving a niche of people (although I do not know about similar alternatives). In the current state, I don't think it will last too long, as lack of support (like reactions to issues etc.) lets people abandon the repo. If you are able to reduce the maintenance, I think you will have time for the (still quite rare) issues coming up from time to time.

On the other hand, if you don't like the extra amount of work and decide to abandon the repo, it would be nice to state that in the README, stating reasons and alternatives (-> hyperopt) so that new people do not use the repo as they otherwise would be disappointed.

Anyway, thanks for the great work you have done!

Cheers, Jonas

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/yandex/rep/issues/104

eyadsibai commented 7 years ago

@anaderi I made a similar suggestion one year back :+1: I hope you consider it one more time ... make the other libraries wrappers in a different repo.

jonas-eschle commented 7 years ago

Hey Andrey, thanks for that! And yes I am, this could be interesting. Just keep me up-to-date. Cheers, Jonas

gandreassi commented 7 years ago

Dear developers,

I started watching this repository very recently so I might have missed important discussions here, but let me say a few things:

I completely agree that something should be done to avoid that this repository with the precious work you did is forgotten. I find it particularly useful and interesting, and I’m sure that with the due advertisement it could have a much larger public.

From my personal experience:

I just started using REP for a new analysis project, and as a ROOT user my opinion is that TMVA bindings should be kept, as this constitutes a huge attraction for a large part of the high energy physics community, making the transition a lot smoother and allowing easy and direct comparison. This, in my case was a huge plus on the side of trying REP instead of sticking to the routine and the standards of TMVA alone.
Instead of using docker like I did the first time I used REP, this time I had to install it manually to be able to use it with ROOT 6.08.02. This choice was driven by the need of being compatible with the rest of my analysis code, which is all based on this version of ROOT. In order to be able to do this, I had to apply some minor modifications to part of your code, since as you might know the syntax of the TMVA Factory has changed in the recent versions. The changes concern for the moment only one file: rep/estimators/_tmvaFactory.py

In conclusion, I don’t know which decision you will take in the end about ROOT, but my modest opinion would be: keep it! It’ll extend your reach to a wider public. And if you like I can make a branch with my simple modifications. It’s not really a lot of work, but it can save you some time, and I would be happy to give my tiny contribution :)

jonas-eschle commented 7 years ago

As an addition to that, which probably is similar to what @eyadsibai suggested: move the root wrapper into a different, small repo (with dependecies on REP). This allows to make REP usable and low maintenance while it is another question (but an independent one) how to maintain the (seemingly liked but easy breakable) ROOT wrapping. To this, it is probably even possible to convince some ROOTers to help/take over the maintenance for the repo, as the only work is to propagate changes in ROOT to the wrapper. I think they would have an interest in that as well and it is surely not too difficult for them to do.

But anyway, this would reduce the maintenance for REP while keeping the possibility for a ROOT wrapper.

jonas-eschle commented 6 years ago

Hey, did you already think about it and come to a conclusion about the REPs future?

anaderi commented 6 years ago

Hey Jonas, The current state of thought is to port features from rep to modelgym.

jonas-eschle commented 6 years ago

cc: @marinang Given that modelgym is still under development, I created a small, primitively cleaned REP legacy: removing the unnecessary dependencies and making it run again (for anyone else in this thread also interested in a running version):

and may consider making it (though another fork) compatible with the newest scikit-learn versions.

I guess there was no change of plan regarding rep?

jonas-eschle commented 4 years ago

Hey all, are there any news here, since modelgym seems to have come to a halt? What are the future plans for any of the repositories, or even new ones?

HosseinAfsharnia commented 4 years ago

"Rep was quiet useful Library, I want to use Uboost in my analysis. I modify some parts of it to be able to use it. Since I want to use the Uboost by sending the job to HTCondor I also do not know if it would be possible to setup Rep and modify it (to be able to use it as I did for my environment) or not.

yandex / rep

Fundamental changes: create REP LTS #104