jakevdp / ECML_PKDD_2013

Paper by scikit-learn contributers for ECML/PKDD 2013 conference
2 stars 0 forks source link

Comparison with other software #15

Open mblondel opened 11 years ago

mblondel commented 11 years ago

I was thinking we should take one of the code examples of the paper (say, the logistic regression one) and reimplement it using another toolkit (say, Weka, since it is one of the most famous).

WDYT?

larsmans commented 11 years ago

Are you familiar enough with Weka (or anything else) to write idiomatic, equivalent code? We have to make sure the comparison is fair.

glouppe commented 11 years ago

I like the idea, however I am not sure Weka would lead to a fair comparison. Most users indeed use Weka through the GUI: there is no code to write! You can of course import Weka packages and write yourself Java programs, but I don't think most users do that.

I think R would make a better comparison. It is quite popular, has tons of machine learning packages and compares with Python as a scientific ecosystem. The user interface is also the same as Python (either with scripts or with a shell). An issue though (that we may highlight though), is that R algorithms do not share any common interface: it might be quite easy to come up with an unfair example because one of those algorithms has an odd interface. The same goes for Matlab...

larsmans commented 11 years ago

We're almost at the 15 pages limit. Is this something we might want to drop?

mblondel commented 11 years ago

I like the idea, however I am not sure Weka would lead to a fair comparison. Most users indeed use Weka through the GUI: there is no code to write! You can of course import Weka packages and write yourself Java programs, but I don't think most users do that.

I know a few people who do their research in Java and use the Weka API...

I think R would make a better comparison.

Comparing scikit-learn to R doesn't make sense to me. R is alanguage and statistical environment...

We're almost at the 15 pages limit. Is this something we might want to drop?

If we have enough space, I think it's important to compare with what has been done before (just like for any paper)...

mblondel commented 11 years ago

We could compare to (or at least mention) Torch7

http://www.torch.ch/

They have a NIPS workshop paper to cite.

mblondel commented 11 years ago

Another candidate for comparison is milk. This would illustrate the model / estimator separation. The downside is that they don't appear to have a paper to cite.

glouppe commented 11 years ago

After some thoughts, maybe comparing with Weka is the good thing to do after all. It is one of the most popular packages for machine learning, while Torch7, milk and other are clearly less used.

I am also wondering if the comparison with Gensim shouldn't shortened? This is relevant, but I admit that I had never heard of it before.

Emphasize on scikit-learn integration in the scientific Python ecosystem (e.g., with NLTK) is also important in my opinion. (As such, we may need to change the section title depending on what we decide to include.)

mblondel commented 11 years ago

If we don't want to make a code comparison, we can also rename the section to "Related software".

larsmans commented 11 years ago

Feel free to trim the Gensim comparison as needed, or remove it if it doesn't fit in anymore.

ogrisel commented 11 years ago

i find the gensim part relevant although it could be trimmed to make some room for other projects like R.

ogrisel commented 11 years ago

For R the main competing packages are according to my subjective reading of the kaggle forums:

Probably others but I am no R user myself.

ogrisel commented 11 years ago

Comparing scikit-learn to R doesn't make sense to me. R is alanguage and statistical environment...

R is a language but also a very cohesive developer community. The CRAN package system makes it very easy to install and combine several machine learning projects.

To me R + most commonly used CRAN packages is the main competitor for scikit-learn.

larsmans commented 11 years ago

I'm afraid we need an R (ex-)user...

(I never got used to R myself. I did a t-test in it once, made a plot, and decided I hate the language.)

mblondel commented 11 years ago

@pprett might be able to help here.

mblondel commented 11 years ago

Do we get away with renaming the section "Related software" ? (i.e. without code comparison) The deadline is on Friday...

larsmans commented 11 years ago

I guess so. We can move the note on Weka that is in section 2.1 to this section.

arjoly commented 11 years ago

I have time to help in the coming days. However, I have no knowledge on R and too few on weka.

mblondel commented 11 years ago

I improved the related software section quite a bit.

Regarding the contributions in scikit-learn that made it to SciPy, I don't think the related software section is the right place for it. I guess we could add a paragraph dedicated to that to the conclusion. @jakevdp do you want to take a stab at it?

amueller commented 11 years ago

without reading all of the above, I think a comparison with caret would make sense, though weka is probably easier and also sensible ;) oh and don't forget shogun!

arjoly commented 11 years ago

oh and don't forget shogun!

We could state that scikit-learn have competitive advantage over machine learning library written in statistically type language, such as shogun and torch, because of the possibility of interactive developments using the console, of type inference and of overall reduced development time (less line of codes, no or less segmentation fault).

amueller commented 11 years ago

+1. I think a more detailed comment would be good but I guess there is too little time

larsmans commented 11 years ago

If you don't mind, I'll just copyedit the comparison section tomorrow and let the reviewers decide. I hardly know enough about other packages; I only used LBJ, looked at the Weka APIs and toyed with VW and some CRF command line tools.

jakevdp commented 11 years ago

Sorry - I've been busy with the Scipy conference and just saw this thread. It sounds like we're going to keep things as-is right now?

mblondel commented 11 years ago

If you want, I think you can still make modifications during the review period.

larsmans commented 11 years ago

No more major changes, please.