Closed chrismattmann closed 8 years ago
Hi @chrismattmann — I think it would be hard to break away from JHU at this point, but I wouldn't say that it's an impossibility. The costs seem clear to me (loss of control); can you help us understand the benefits (and perhaps present a more complete pictures of the costs as well)?
CC: @callison-burch
no problem at all @mjpost . The reality is you wouldn't really lose control - take a look at the Apache ICLA (you license your contributions to the ASF). However, if you scope out the ASF project management committee, the ASF is really a home for independent, separately managed PMCs.
PMCs are autonomous entities that share a belief in open source that we call the "Apache Way". these are loose set of principles that keep us together:
Many science projects are looking at the ASF especially in the age of dwindling grants, etc. Have a look at http://ctakes.apache.org/ and http://airavata.apache.org which came out of the DHHS (SHARP initiative) and NSF, respectively. (XSEDE) http://oodt.apache.org/ is another example from NASA and there are more coming like OCW http://climate.apache.org and CMDA (Climate Model Diagnostic Analyzer).
Here are some refs: http://www.apache.org/foundation/how-it-works.html http://www.apache.org/dev/pmc.html http://www.apache.org/dev/new-committers-guide.html http://community.apache.org
I'd also be happy to help.
cc @lewismc
I am very interested in this folks.
On Friday, June 19, 2015, Chris Mattmann notifications@github.com wrote:
cc @lewismc https://github.com/lewismc
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#issuecomment-113558987 .
Lewis
This sounds good to me too.
On Jun 19, 2015, at 4:03 PM, Lewis John McGibbney notifications@github.com wrote:
I am very interested in this folks.
On Friday, June 19, 2015, Chris Mattmann notifications@github.com wrote:
cc @lewismc https://github.com/lewismc
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#issuecomment-113558987 .
Lewis — Reply to this email directly or view it on GitHub.
@mjpost in addition to @chrismattmann comment, you mentioned costs. In terms of financial aspect of infrastructure... WebSite, CI, CMS, SCM, Mailing Lists, etc. well that is all facilitated by the foundation. Another advantage of joining TheASF is that Joshua would most likely have more cross community collaboration with other machine learning folks over in Apache Mahout, Apache Spark, and cTakes etc. as mentioned by @chrismattmann.
Oh by the way, did I also mention a small insignificant project called Apache Tika? ;) I think it is fair to say that Chris and myself would both very much like to see Joshua come to TheASF and grow. The Hadoop aspect of Joshua codebase would undoubtedly improve pretty radically as swell once Joshua starts releasing and announcing.
BTW, bq. Oh by the way, did I also mention a small insignificant project called Apache Tika? ;) This is a joke, we would love to have better integration with Joshua over in Apache Tika. Tika is a very well used library and an excellent, dynamic, bustling community. Joshua would certainly benefit from better engagement.
Okay, this seems pretty appealing. I have a licensing question, though. Joshua contains an LGPL'd library for handling language models (KenLM). There is an alternative (BerkeleyLM), but it is not actively maintained any more and is not quite as good as KenLM in a few key respects. A quick glance at the incubator page suggests that this dependency would keep the project from becoming a full-fledged one. Can you comment on this?
We could also ask Kenneth if he would consider offering it with an Apache license.
On Jun 20, 2015, at 5:27 PM, Matt Post notifications@github.com wrote:
Okay, this seems pretty appealing. I have a licensing question, though. Joshua contains an LGPL'd library for handling language models (KenLM). There is an alternative (BerkeleyLM), but it is not actively maintained any more and is not quite as good as KenLM in a few key respects. A quick glance at the incubator page suggests that this dependency would keep the project from becoming a full-fledged one. Can you comment on this?
— Reply to this email directly or view it on GitHub.
@Chris, this is a very important suggestion. An initial path which I've pursued is to ask the entire Apache incubator community for an alternative to the library Joshua currently consumes [0]. Licensing over time can and does become an issue as you guys know [1]. I would like to mention that although the library is an issue, this is not a blocker at all for building stronger community around Joshua. Lets see how the Incubator thread goes. Lewis
[0] http://www.mail-archive.com/general%40incubator.apache.org/msg49043.html [1] http://www.apache.org/licenses/GPL-compatibility.html
On Sat, Jun 20, 2015 at 5:52 PM, Chris Callison-Burch < notifications@github.com> wrote:
We could also ask Kenneth if he would consider offering it with an Apache license.
On Jun 20, 2015, at 5:27 PM, Matt Post notifications@github.com wrote:
Okay, this seems pretty appealing. I have a licensing question, though. Joshua contains an LGPL'd library for handling language models (KenLM). There is an alternative (BerkeleyLM), but it is not actively maintained any more and is not quite as good as KenLM in a few key respects. A quick glance at the incubator page suggests that this dependency would keep the project from becoming a full-fledged one. Can you comment on this?
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#issuecomment-113824895 .
Lewis
Any thoughts on this, @kpu?
Hey Guys yeah it's not a total blocker. We've dealt with similar issues e.g., with Apache OpenOffice which had a strict dependency on LGPL dictionaries and so forth and the Apache legal committee granted an exception (not broad, but in that particular situation). We could ask for a similar exception during Incubation and get it sorted out. If @kpu is willing to relicense, of course that would be awesome. In addition the dependency is a runtime dependency, not a static binding one, right, @mjpost ?
ACK
On Sat, Jun 20, 2015 at 7:55 PM, Chris Mattmann notifications@github.com wrote:
Hey Guys yeah it's not a total blocker. We've dealt with similar issues e.g., with Apache OpenOffice which had a strict dependency on LGPL dictionaries and so forth and the Apache legal committee granted an exception (not broad, but in that particular situation). We could ask for a similar exception during Incubation and get it sorted out. If @kpu https://github.com/kpu is willing to relicense, of course that would be awesome. In addition the dependency is a runtime dependency, not a static binding one, right, @mjpost https://github.com/mjpost ?
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#issuecomment-113842623 .
Lewis
KenLM actually contains several components that are important to the Joshua tool chain. We use it for building language models during training (via the lmplz
binary, which itself has a boost dependency), the build_binary
component to compile the resulting ARPA-formatted text file to a packed trie, and then a library ($JOSHUA/lib/libken.so
) to efficiently query that file via a JNI bridge. There are alternatives to all of these, but they are not as good.
But to answer your question, yes, the most important of these (the library) is a dynamic dependency.
Thanks @mjpost that helps to answer it. Because it's a runtime dependency we can likely get an exception to this and deal with it via intelligent packaging and so forth. This isn't a blocker at all.
For purposes of Joshua, it's standalone executables that could just be documented with a pointer, a shared library via JNI, and a bit of Java-side wrapper code. The Java code can just be considered part of Joshua.
The question is whether it's up to me to relicense. I do not have contributor license agreements in place. There are several contributors with varying employers who may or may not have a work-for-hire claim. My guess is most do not care what open-source license is used, but they would likely need to be contacted.
Also, don't forget your other LGPL dependency: https://twitter.com/joshuadecoder/status/563072947586613248 "@mosessmt Also we make heavy use of your pipeline, keep the improvements coming on that :)"
The Moses dependency is a good point — we have occasionally borrowed scripts and other miscellaneous tools, and currently rely on Moses for a portion of the phrase-based model building. However, removing that piece entirely wouldn't be too much work.
There are other important tools, however, such as word aligners (GIZA++ and the Berkeley aligner), which are both GPL licensed. I have packaged a lot of this with Joshua in order to remove external dependencies, and try to make it easier for people to build models from start to finish. Airing all of this would require a close look at the whole pipeline. Much of it could be replaced if there were more hands on deck, or also just left to the user to install.
yep @mjpost - leaving things to the user to install and not packaging directly with Joshua but making intelligent packaging tools is a common practice in a lot of these situations and we could employ them here. @kpu thanks for the quick reply. If you are able to relicense, something on the permissive end ALv2, BSD and/or MIT would be much appreciated. see: http://www.apache.org/legal/resolved.html#category-a
ACK
On Sat, Jun 20, 2015 at 8:35 PM, Chris Mattmann notifications@github.com wrote:
yep @mjpost https://github.com/mjpost - leaving things to the user to install and not packaging directly with Joshua but making intelligent packaging tools is a common practice in a lot of these situations and we could employ them here. @kpu https://github.com/kpu thanks for the quick reply. If you are able to relicense, something on the permissive end ALv2, BSD and/or MIT would be much appreciated. see: http://www.apache.org/legal/resolved.html#category-a
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#issuecomment-113844601 .
Lewis
Moving the JNI code over to kenlm (and preferably cleaning it up so it's generically useful, not just for Joshua) would solve most of this.
The following places might have to agree to a license change: CMU, Edinburgh, Adam Mickiewicz University, Stanford, Bloomberg, NAIST, UIUC, Yandex, and SDL.
Thanks @kpu - not a blocker at all and this can be dealt with during incubation. @mjpost let me know what you think about moving forward - maybe we can wait a few days get some more feedback and proceed or not
I'm considering it positively. Can it wait till early August (i.e., six weeks)? I am involved with a summer workshop which is consuming most of my time at the moment (including keeping me from reading thoroughly through your docs).
I suppose this move would also involve changing hosting? Does Apache support git? Skimming, I only saw notes about using SVN....
(I can likely answer these questions myself and will have more time to do so a week from now, if you want to let them lie)
hey @mjpost thanks. Sure we can revisit in early August, that's totally fine, no rush. Apache does support Git, and it even has writeable git repositories, and mirrors out to Github. So joshua would move to Apache writeable Git at something like https://git-wip-us.apache.org/repos/asf/joshua.git (for a working link, see: https://git-wip-us.apache.org/repos/asf/tajo.git ) and then could be mirrored to Github at http://github.com/apache/joshua.
Talk soon. Thanks!
Okay, great, I've made a note to come back to this then.
Sounds good @mjpost
Should this be left open its not really closed yet? @mjpost
Chris, the thread on general@ started off well then diverged at an incredible rate. That being said, up until just after your reply provides us with valuable commentary fro general@ community to progress with the dependnency issues. I'll make an effort to set the options out below (it will not be an exhaustive list... There is more than one way to skin a cat)
On Wednesday, July 1, 2015, Matt Post notifications@github.com wrote:
Reopened #204 https://github.com/joshua-decoder/joshua/issues/204.
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#event-345642372.
Lewis
hi @mjpost ready to pick this back up?
@chrismattmann close this off ;)
yep we can close this Joshua is now an Apache Incubator podling! :) https://issues.apache.org/jira/browse/INFRA-11264
Dynamite
On Saturday, February 13, 2016, Chris Mattmann notifications@github.com wrote:
Closed #204 https://github.com/joshua-decoder/joshua/issues/204.
— Reply to this email directly or view it on GitHub https://github.com/joshua-decoder/joshua/issues/204#event-549708296.
Lewis
Hi @mjpost and others of the Joshua community. Is there any interest in the project coming to the Apache Software Foundation? I brought this up offlist to Matt and there was some interest, but I never followed up so thought I would do so publicly and transparently here.
Apache has a guide for new projects: http://incubator.apache.org/guides/proposal.html
I would be very happy to champion this project in the ASF if there is interest.