jekyll / classifier-reborn

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.
https://jekyll.github.io/classifier-reborn/
GNU Lesser General Public License v2.1
551 stars 109 forks source link

Port classifier-reborn to Crystal lang #100

Closed bararchy closed 7 years ago

bararchy commented 7 years ago

This is just an idea, would you be OK with trying to port classifier-reborn to Crystal ?
I use the Bayes classifier in production setup and would really love to have it available also in Crystal for more heavy lifting and scaling.

Ch4s3 commented 7 years ago

I don't really see a problem with that. @parkr what kind of association would you like to have with any ports? Also does a port have to inherit the LGPL license?

bararchy commented 7 years ago

@parkr ?

parkr commented 7 years ago

Hey! I'm not a lawyer but my understanding is that if you are translating a library to another language then it must retain the same license as the original. It's akin to translating a book: a German translation of Shakespeare is still Shakespeare so you have to abide by copyright provisions for the original work.

bararchy commented 7 years ago

I see, I was planning on MIT lib, but, I guess LGPL is fine too. Just wanted to make sure you guys are fine with that :)

Ch4s3 commented 7 years ago

Yeah, I'm totally open to it, I'll even try to help if you want.

parkr commented 7 years ago

@bararchy The alternative is to do a "clean room rewrite" where you try to forget how this lib works and implement a clean Bayesian Classifier without looking. This is akin to taking a standardized test where you go to a room you have never been to without any information and answer math problems. That's the "safest" approach, in my understanding.

The LGPL is summarized here: http://choosealicense.com/licenses/lgpl-3.0/

bararchy commented 7 years ago

@Ch4s3 thanks :) I would love any help I can get , especially from someone as experienced as you in this field .

@parkr I'll think about it , I really wanted classifier-reborn.cr as an homage to your hard work on this. Though even if I'll create a new lib I'll still credit you guys for the idea and inspiration

Ch4s3 commented 7 years ago

You're welcome to do a straight port, but the original license that Lucas setup is LGPL, so if you do a straight port, you also have to use LGPL. Obviously @parkr and I aren't going to be upset about it either way, but we don't want you to accidentally run afoul of the license either, so read up a bit and let us know what you decide to do!

ibnesayeed commented 7 years ago

In some cases I am not even very sure what is licensed really. Especially, when we implement a published algorithm or mathematical model, we often don't have too many choices. For example, if a bunch of people try to implement bubble sort independently, how many different ways they can come up with? The over all algorithm will be the same, perhaps they might differ in indentation or other code styles, inline comments, some might divide sub-tasks in functions, or choose variations in loops and conditionals when possible. They might utilize one data structure over the other, but there will be only a few data structures that would fit certain algorithms, and some will be more efficient than others which everyone is entitled to use.

In this context, let's take the example of Bayesian classifier, even if I try not to look at the code for a year then try to implement it by myself by just looking at the well known algorithm of how probabilities are calculated in Bayes, I am afraid, my implementation will not be too different at lower level. Additionally, if we talk about the API, I am sure train, untrain, and classify would be the most sensible methods to make publicly available, internal helper methods and refactoring could be different though.

Please note that by no means I am against licensing or giving credits. However, in some cases it is just too difficult to digest the role of licensing for non-lawyers. :-)

tra38 commented 7 years ago

I am not a lawyer but...

Algorithms can't be copyrighted, but implementations (and derivative works of those implementations) can be. So while the idea of a Bayesian classifier cannot be copyrighted, the specific implementation details can be. But if you were to come up with a new implementation independently, without looking at the source code of the existing implementations, then you own the copyright for that specific implementation (even if that implementation is very similar).

You might be able to copy APIs and small snippets of other people's code by claiming "fair use", but "fair use" is merely a legal defense that can be used during a "copyright infringement" lawsuit. It doesn't actually protect you against lawsuits (then again, anyone can sue anyone for anything).

My personal opinion is to avoid all these problems by having the Crystal port follow the LGPL license. I don't necessarily like it (I prefer MIT), but whoever started work on classifier-reborn loved LGPL, and it seems reasonable to respect his/her wishes by carrying his/her license choice onward...especially when LGPL doesn't really cause that much damage to other people.

ibnesayeed commented 7 years ago

@tra38, thanks for you input. Personally, I also prefer MIT license for the open source software I release as it allows usage of my work in commercial software as well.

Ch4s3 commented 7 years ago

Personally I prefer MIT as well, but neither @parker nor I were here when Lucas chose the license. Like I said, I'm happy to help with an LGPL port.

ibnesayeed commented 7 years ago

A quick question, can we use this library in an application that is released under MIT license?

parkr commented 7 years ago

Yes, but the source code for it must be open sourced.

ibnesayeed commented 7 years ago

@parkr: Yes, but the source code for it must be open sourced.

A software released under MIT means it is open source. However, that software can internally be used in some closed proprietary software and inherit this library along.

tra38 commented 7 years ago

I think the library can be used in an application that is released under MIT, but the library must stay open source. The actual program itself can be MIT/closed-source, though it is dependent on the open-source library. It's meant to be a compromise between MIT (library can become closed-source so long as you provide credit to the original writers) and GPL (entire program becomes licensed under GPL if you use the library).

parkr commented 7 years ago

This is pretty interesting – I don't have a lot of experience with the LGPL.

Quoting from the gnu.org FAQ:

If you statically link against an LGPL'd library, you must also provide your application in an object (not necessarily source) format, so that a user has the opportunity to modify the library and relink the application.

if you yourself convey the executable LGPL'd library along with your application, whether linked with statically or dynamically, you must also convey the library's sources, in one of the ways for which the LGPL provides.

The first means the library must be interchangeable – if I don't want to use the Crystal version of classifier-reborn that you write, I should be able to modify any source required to use a different version. This appears to apply to services, not just code you distribute, though I can't be sure. The second means that if you distribute the library in your code, you must distribute its source as well. Not sure how Crystal works, but keep those in mind.

It seems fairly safe to use with BSD-style licenses.

bararchy commented 7 years ago

Closing as a project already began implementing a Classifier in Crystal here: https://github.com/mathieulaporte/machine

If anyone is interesting in contributing , it's MIT licensed and accepting PRs :)