Closed patyork closed 7 years ago
This is a good initiative.
How do we keep the chain of commits and contributions/contributors intact if or when we start pushing items into the core Keras from the external repo?
Git allows us to do pretty much anything. In particular we can reapply a chain of commits with arbitrary authors. A contrib repo should live as an independent repo from Keras (not a fork), and we can use custom git scripts to transfer code from contrib to core Keras.
How would we go about determining if something is used/useful enough to move to Keras?
Maybe by vote, as well as Github code searches.
Additionally: I would rather not have to manage a contrib repo myself, since Keras keeps me busy enough already. We could have it under the control of a small number of top Keras contributors.
On 6 January 2017 at 19:36, Pat York notifications@github.com wrote:
@fchollet https://github.com/fchollet Since we're not adding much to the repo at this stage (in terms of layers, loss functions, callbacks, etc.), we've talked quite a bit about an external repo for user/additional contributions, that, based on utility and usage may eventually make it into the core.
I'm curious about how you would want/like to go about this. An external repo is not a big deal, but I'm wondering:
- How do we keep the chain of commits and contributions/contributors intact if or when we start pushing items into the core Keras from the external repo?
- How would we go about determining if something is used/useful enough to move to Keras?
Just looking for thoughts on this topic. I've seen a repo or two that are not forks (completely external) and overwrite certain Keras files then do a setup.py install, but that seems like a poor way to do it. A fork would keep the chain of contributions in tact and would be easy to merge, but having any reference to a fork would just be confusing in my opinion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWbyJoeEQr-9R5-FroUODNzUUu_fglks5rPomxgaJpZM4Lc-35 .
I anticipated you not wanting to manage that repo as well, given that you are the gatekeeper of this fairly massive one already. I was looking to get the contrib repo up and running, but I want to be sure it's setup correctly (e.g. in a useful way that allows future merges into Keras core) before starting anything.
I've seen a fair amount of decent code/PRs declined in the last couple of months alone that would fit well as contribs, so I think this would be good to figure out a framework for ASAP.
Perhaps we can do a proof of concept along the lines of a new repo, adding what amounts to come contrib code, and determining what works best in getting it back into Keras (or a temp branch).
This is a great idea. I have a few things I could contribute and would be willing to help moderate.
So would it be a kinda beta release of keras new components?
Cool idea!
I wouldn't call it solely a beta for Keras, although that could definitely be part of it. I think of it as an extensions library as well; that is, some of the less used items (loss functions, layers, etc) will be merged and accessible via installation and importing of the contrib library, but they may not be used enough to make sense in the Keras core library and would never make it in there.
So, an extension library where some of the extension make it back into the Keras core.
I could definitely help moderate.
Would be great to have a contrib repo up and running. There are a lot of trig functions to add to keras backend and it might be cleaner to have them in a separate repo. I've also been wanting to add functionality for multiplayer models (GANs, BiGANs, AAEs, etc.) which would be better to put in a contrib repo. It would be a great place for stable code we just don't want polluting Keras. Not everyone is going to need arccosh or whatever.
Would the repo have separate CI, documentation hosting, etc? Separate Google/slack groups or no?
On that note, it would be nice for the contrib repo to have its own backend module (with the same logic and file structure as keras.backend) for any weird or esoteric backend functions.
Might be nice to hook all this up as an extra so we can do pip install keras[contrib]
and it would let us do something like from keras.contrib.layers import FancyUntestedLayer
. We'd need some mods inside the keras setup for that, though. We can have just a keras_contrib
PyPI package that can be standalone or patched into the main keras if installed as pip install keras[contrib]
.
I think we need separate CI / dox, though.
I think we should try to keep it entirely separate from Keras; simply a new repo that utilizes Keras for some things (backend functions, standard callbacks, etc), but that brings new functionality to the table.
An example would be
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras_contrib.layers import ForgetfulLSTM #not a real type of layer, I don't think
# but with a mirrored directory structure, the extensions should be self explanatory
import keras.backend as K
import keras_contrib.backend as C # perhaps we can call this "K" and if the function is not in the contrib library
# we can just fall back directly to the Keras backend? e.g.
# import keras_contrib.backend as K
C.arccos(data) # extended backend functions
K.mean(data) # standard functions still available
model = Sequential()
model.add(ForgetfulLSTM()) #layers compatible with standard Keras models
model.add(Dense())
I've been playing around with it a bit today; I'm still stuck on wrapping my head around what mechanisms are available for preserving commit history across dissimilar repos. Making a bit of progress, but without being able to get the correct history of the extension back into Keras, if/when the changes are promoted to keras core, the contrib repo is kind of moot - copy/pasting code doesn't appeal to me.
Watch ForgetfulLSTM make it to NIPS 2018.
Something like this might be relevant. Involves a filtered view of one repository then pulling updates into the other.
http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/
If contrib is a large hodgepodge of things, maybe it makes sense to divide contrib up into extras.
Especially if we roll some specialized dependencies into contrib. For example, I wrote keras integration with tqdm which makes really nicely formatted progress bars in Jupyter. We could have a contrib extra that depends on progressbar, a contrib extra that depends on tqdm, etc.
We don't want to add any dependencies to keras to keep things lean but there could be contrib extras with various additional dependencies.
I thought a bit about submodules, subtrees, etc, but I think it would be better off to just be a separate repo (such that, when needed, we can merge stuff into Keras core). So far, it seems like just a mirrored structure and a bit of manual reworking and applying of patches will get us to:
I'd image the contrib repo being pretty lax in terms of requirements; just about anything can get merged, as long as it is valid code relating to deep learning, is "useful" to a wide audience (not super specific to a single model/dataset/person) and has tests written for it when called for. The really useful stuff would hopefully eventually move to Keras and be removed from the contrib repo, without much hassle.
Having played around with it a bit more, I think the separate repo is entirely viable. A quick test is below.
Some commits in the external repo (note: the repo is set up with similar structure/files as Keras, but this is not strictly necessary):
The highlighted commit above is put into a git patch via git format-patch
. A little finagling of the patch file makes it compatible with Keras and available to be able to apply; these finagling changes are just to put the commit in the correct spot in the destination file. We take the patch to a new branch in Keras and apply with git am --signoff
which keeps the commit message, original contributor, and has the sign-off note highlighted in blue:
At that point, the signee can reapply more commits in the same way; finally, just review and make any changes to the committed code, in a separate commit. These final fixes would be, perhaps, moving/removing/adding/editing import statements and their usage in the new code (e.g. elephas.layer.Name --> keras.layer.Name).
A lot of the manual stuff above could end up being handled automatically via simple search and replace; the patch finagling could be easier as well with a gui versus editing a diff file directly.
I don't think a contrib repo should mirror the code structure of core Keras, and I don't understand why it should (it you feel strongly about it, please explain what you reasons are). E.g. tf.contrib does not mirror core tf, it's just a bunch of extras.
In the same way, I would see a Keras contrib repo as divided into modules such as callbacks, layers, metrics, etc, with no redundancy between core Keras code and contrib code.
On 9 January 2017 at 00:33, Pat York notifications@github.com wrote:
Having played around with it a bit more, I think the separate repo is entirely viable. A quick test repo here https://github.com/patyork/elephas/commits/master(elephas, meaning ivory https://github.com/fchollet/keras#why-this-name-keras).
Some commits in the external repo (note: the repo is set up with similar structure/files as Keras, but this is not strictly necessary): [image: image] https://cloud.githubusercontent.com/assets/1304633/21754495/82d8fb48-d5b6-11e6-8617-9059e6cfaf3c.png
The highlighted commit above is put into a git patch via git format-patch. A little finagling of the patch file makes it compatible with Keras and available to be able to apply; these finagling changes are just to put the commit in the correct spot in the destination file. We take the patch to a new branch in Keras and apply with git am --signoff which keeps the commit message, original contributor, and has the sign-off note highlighted in blue: [image: image] https://cloud.githubusercontent.com/assets/1304633/21754525/1b8209d4-d5b7-11e6-82fc-f650049e6174.png
At that point, the signee can reapply more commits in the same way; finally, just review and make any changes to the committed code, in a separate commit. These final fixes would be, perhaps, moving/removing/adding/editing import statements and their usage in the new code (e.g. elephas.layer.Name --> keras.layer.Name).
A lot of the manual stuff above could end up being handled automatically via simple search and replace; the patch finagling could be easier as well with a gui versus editing a diff file directly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-271188622, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb6ut1gDkDaBAqjV2PaPxJOX50XRvks5rQXI2gaJpZM4Lc-35 .
I agree it should be divided up into modules, such as callbacks, metrics, layers, backend additions. The best way to do that is via a directory and file structure that Keras has; e.g.:
The above files, while named the same, contain no Keras code; perhaps they will have a from keras import x
statement, but no copies of Keras code. Following the Keras directory and file structure is simply the neatest way to organize everything. An example:
from keras.layers import Dense
from elephas.layers import ForgetfulLSTM
Import structure follows a reasonable pattern between Keras versus the contrib repo.
@patyork if there are going to be more things being added by a larger contributor base it is going to be easier to manage with separate files. So, maybe metrics.py, regularizers.py, backend, etc. should be upgraded to modules/directories for the contrib repo? Add in an import *
so we don't have conflicting commits in the init.py and everyone can safely works on their own metrics, etc.
More PRs would be adding files instead of editing files, so hopefully fewer conflicts and cleaner histories.
Pat York: yes, that sounds perfectly reasonable.
On 9 January 2017 at 01:16, Ben notifications@github.com wrote:
@patyork https://github.com/patyork if there are going to be more things being added by a larger contributor base it is going to be easier to manage with separate files. So, maybe metrics.py, regularizers.py, backend, etc. should be upgraded to modules/directories for the contrib repo? Add in an import * so we don't have conflicting commits in the init.py and everyone can safely works on their own metrics, etc.
More PRs would be adding files instead of editing files, so hopefully fewer conflicts and cleaner histories.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-271192449, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb9F7mRZD9Qmc5BMONQS3CTbYxjpsks5rQXxigaJpZM4Lc-35 .
It's probably a good idea to choose another name to avoid confusion, Elephas is taken by this keras related project: https://github.com/maxpumperla/elephas
Elephas is already taken indeed, and it's a quite popular project. How about just keras_contrib, or keras_extras?
On 10 January 2017 at 02:00, Michael Oliver notifications@github.com wrote:
It's probably a good idea to choose another name to avoid confusion, Elephas is taken by this keras related project: https://github.com/maxpumperla/elephas
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-271456231, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb8bWvL0freVtLD8AYcErAAlexlDUks5rQthIgaJpZM4Lc-35 .
Ah bummer - I thought that name would have been clever for an extension repo.
For a detailed and mature design of the sort being discussed here look to boost, the obvious tensorflow contrib, homebrew, and Qt. All are mature projects.
Also note splitting up will open up the latest successor to dll hell... versioning hell, as found in brew. OpenCV has done the same thing with their opencv_contrib repository, and as a few google searches can attest it seems to be very painful for users.
What about simply having a contrib folder in keras that is clearly marked experimental where things can be moved to the mainline folder as they are proven as done by tensorflow and some boost modules?
As @fchollet suggested, opening up commit permission to a few trusted top contributors is likely a good plan to reduce his workload. Making them the owner of that contrib directory could be a good way to go about it.
Is it possible to give commit rights specific to a directory on Github? Last time I checked, commit rights were a binary: either you had everything, or nothing.
On 10 January 2017 at 03:33, Andrew Hundt notifications@github.com wrote:
As @fchollet https://github.com/fchollet suggested, opening up commit permission to a few trusted top contributors is likely a good plan to reduce his workload. Making them the owner of that contrib directory could be a good way to go about it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-271469900, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb90YMFPCquD4bSry4cmbjviPT4ZJks5rQu3kgaJpZM4Lc-35 .
Yeah I guess this could mean trusting the people with access. Git hooks or the github status api could be another option.
I think it's all or nothing - I've heard that gitlab has something like this, but I'm not sure. If we were to have other people as the "owner" of the contrib folder, we could just have @fchollet tag PRs as contrib worthy but not core-ready, and it would have to be an honor system thing (perhaps with guidelines, such as the removal of moderator status if violated) such that moderators do not merge / push directly to core.
as @lukedeo says with the honor system, there's nothing like a lightweight code of conduct for contributors (pull requests) and collaborators (commit access) to keep things in order. If it doesn't work out, revoking collaborator access is just a few clicks.
That seems like a reasonable thing to do. Here's what we would need:
1) a document mapping contributor usernames to lists of folders they are authorized to commit files to. 2) a code review guide to help collaborators perform consistent reviews and vetting of contrib PRs. 3) a notification system were PRs made in contrib (or PRs tagged as belonging in contrib) trigger an email to the contrib collaborators.
I can write 2). Any suggestions as to 3)? Also, anything I am missing?
Additionally: what is the current list of Keras contributors interested in maintaining such a contrib folder?
On 10 January 2017 at 04:13, Andrew Hundt notifications@github.com wrote:
as @lukedeo https://github.com/lukedeo says with the honor system, there's nothing like a lightweight code of conduct https://github.com/Homebrew/brew/blob/master/CODEOFCONDUCT.md for contributors (pull requests) and collaborators (commit access) to keep things in order. If it doesn't work out, revoking collaborator access is just a few clicks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-271475167, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWbzoPk4n8cNpv7jmjIsWAtEOL4QUEks5rQvdQgaJpZM4Lc-35 .
As for 3), I can take a stab at a simple polling system with PyGithub that would cost O($0.01)
to run per month - I can start taking a look tonight. As for list of contributors, we could open an issue and people can 👍 if they want to be considered?
Great, thanks!
Currently I am counting: @patyork, @lukedeo, @the-moliver. Anyone else, please 👍 this post to be considered.
I'm quite interested.
Just thinking ahead a bit, but if possible it would probably be good to have a separate TravisCI build for the contrib folder. Theoretically (and hopefully in practice) contrib items shouldn't break the Keras core, or even touch it, making those 15-20 minute tests redundant. I've never used TravisCI in depth, so I'm not sure if you have that fine of control over whether a certain PR calls test suite A versus suite B.
@patyork as tensorflow does, the pull requests can be compiled by travis before merges are permitted. For that reason I'd suggest merging via pull request with no direct commits be one element of the code of conduct. It gives the opportunity for code review and travis ci to run.
@fchollet I've set up a system using AWS API-GW => AWS Lambda => AWS SES using a web-hook from GitHub that should run at ~0.40$/mo that'll be ready by tomorrow. Only remaining questions are:
I envision that we would have a COLLABORATORS.txt
document (or similar) in the Keras root that would describe a mapping between Github usernames (maybe email addresses) and directories that collaborators can commit to. This would be updated by hand, and could be used as input to automatically update any infrastructure we have around collaboration.
Thinking further about the contrib repo, there are still a couple reasons why we might want to keep it separate from Keras:
from keras import contrib
but this wouldn't work in TF-internal Keras.I hope we can solve the first point. The second point may not be so important, but still something to keep in mind.
@lukedeo you could definitely be in charge of maintaining the notification system (in my experience it's possible to leave a similar setup running and essentially never have to touch it in years). It would just need to have open-source deployment instructions so that anyone could take it over in the future.
When making changes to contrib, we don't want to trigger a Travis build for all of Keras, as Pat York points out. Will there be any way to prevent this if contrib is a Keras subdirectory?
Sorry for missing this, but what is the downside to triggering a travis build for all of keras?
A TF-only version of Keras will be moved to the TF repo in the near future. This version should maintain the same API as the external (multi-backend) version.
Cool! Will it be in sync with the multi backend version? sorry for the digression
Of course it cannot include the contrib repo, so that would induce an API discrepancy: in external Keras you will be able to do from keras import contrib but this wouldn't work in TF-internal Keras.
Will the api be import tf.contrib.keras
vs import keras.contrib
or something similar to that? Would such a difference make for a reasonable indicator of what's available?
On the Travis build, the downsides are fairly minor of kicking off a full build each time: It's long running time and just running unnecessary builds on Travis (which is free for the end users).
On the TF integration/move, I think he just means that it wont include Keras' contrib folder in the TensorFlow repo? Meaning if someone is using from TensorFlow directly, they wont have access to these contributions/extensions without cloning/downloading Keras itself? Whereas a separate repo for the Keras contributions would sit on top both Keras standalone and the TF version (as an optional install). I could be wrong though, this is the first I've read about a TF-Keras integration.
-----Original Message----- From: "Andrew Hundt" notifications@github.com Sent: 1/10/2017 8:18 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Mention" mention@noreply.github.com Subject: Re: [fchollet/keras] Contrib Repo (#4944)
When making changes to contrib, we don't want to trigger a Travis build for all of Keras, as Pat York points out. Will there be any way to prevent this if contrib is a Keras subdirectory? Sorry for missing this, but what is the downside to triggering a travis build for all of keras? A TF-only version of Keras will be moved to the TF repo in the near future. This version should maintain the same API as the external (multi-backend) version. Cool! Will it be in sync with the multi backend version? sorry for the digression Of course it cannot include the contrib repo, so that would induce an API discrepancy: in external Keras you will be able to do from keras import contrib but this wouldn't work in TF-internal Keras. Will the api be import tf.contrib.keras vs import keras.contrib or something similar to that? Would such a difference make for a reasonable indicator of what's available? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
On the Travis build, the downsides are fairly minor of kicking off a full build each time: It's long running time and just running unnecessary builds on Travis (which is free for the end users).
Travis there is a caching mechanism which could be used for incremental builds, or something like docker could be used to facilitate incremental builds and reduce build time. Even stuff in contrib should likely be verified, right?
On the TF integration/move, I think he just means that it wont include Keras' contrib folder in the TensorFlow repo?
I think he was saying that he has been working on a new TF only implementation of Keras that currently is internal at Google that will eventually be directly integrated into github.com/tensorflow/tensorflow.
I envision that we would have a COLLABORATORS.txt document (or similar) in the Keras root that would describe a mapping between Github usernames (maybe email addresses) and directories that collaborators can commit to.
What does "directories that collaborators can commit to" mean? Does it mean that we have several owners for the keras_contrib repo who are respectful for different directories to merge the commit from other collaborators?
Here is a homebrew link I meant to post which has all the useful info for contributors. I thought it might make for a good example: https://github.com/Homebrew/brew/tree/master/docs
I just realized contrib is going to break saving and loading and a PR I just made for a bugfix is a possible solution. https://github.com/fchollet/keras/pull/5012 (PR probably still needs some additional docs and unit tests).
I added a global dictionary of custom objects. I updated get_from_module
to first check that dictionary. This can be a standard location to register custom objects so they load by name and work with save and load.
The idea for contrib is that I could write a file l3_regularizer.py
class L3Regularizer(Regularizer):
...
get_custom_objects().update({"L3Regularizer": L3Regularizer})
If I then import keras_contrib.regularizers.l3_regularizer
, Dense(...,W_regularizer='L3Regularizer')
will work as expected and saving and loading will work as well. This is a potential standard for how to add custom objects to custom_objects
and have everything save and load correctly.
Any thoughts on how contrib is going to affect saving and loading?
Cheers, Ben
Any movement on this?
This pull for custom objects just got merged yesterday: https://github.com/fchollet/keras/pull/5012
@fchollet suggested a scope operator which makes things much cleaner and helps prevent name collisions. Custom objects can be loaded within a with
statement which prevents leaks.
We still need to enforce some standard for object names which would help reduce chances of collisions further, for example, using the module name in the name returned from get_config. Saving, loading, and instantiating by name will work as long as the string returned from get_config matches the string in custom_objects.
The trade-off is if we use fully qualified names, referencing the object by name gets a little wordy like Dense(512, ..., W_regularizer="l3_regularizer.regularizers.L3Regularizer")
but you could still just create the instance yourself like Dense(512, ..., W_regularizer=L3Regularizer())
.
#keras_contrib.regularizers.l3_regularizer.py
class L3Regularizer(Regularizer):
...
custom_objects = {"regularizers.l3_regularizer.L3Regularizer": L3Regularizer}
#using the module
from keras_contrib.regularizers import l3_regularizer
with custom_object_scope(l3_regularizer.custom_objects):
# custom object will resolve within this scope by name
layer = Dense(512, ..., W_regularizer="regularizers.l3_regularizer.L3Regularizer")
custom_objects
is still a parameter to several functions and works as it did previously, but working through custom_object_scope
prevents name leaking that was the existing functionality.
@bstriner Just to be clear, even in the case of a built in contrib folder (in the Keras repo), the above notation would be required as all contributions would be considered custom objects?
Could we start a contrib repo as an external repo (and potentially include it into Keras later down the road if that sounds like a good strategy?)
On 22 January 2017 at 12:08, Pat York notifications@github.com wrote:
@bstriner https://github.com/bstriner Just to be clear, even in the case of a built in contrib folder (in the Keras repo), the above notation would be required as all contributions would be considered custom objects?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4944#issuecomment-274355399, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWb28yDbmZH3djqV2pMWdoZ6qPDr4Fks5rU7dZgaJpZM4Lc-35 .
I think that's a fine idea - I don't have the reservations against a (well managed) external repo that others have (such as versioning issues and whatnot).
@patyork such notation would be required for any objects you don't want to be part of the default namespace but want to add conditionally. This is also the notation you would use for custom objects in your own code if you want your models to save and load correctly.
If you want something to be included by default, you could add get_custom_objects.update(custom_objects)
to the bottom of the module and make sure it is imported somewhere.
For objects in keras master, an alternative would be a further PR to get_from_module that can handle dotted objects. Then, contrib.L3Regularizer
would resolve to keras.regularizers.contrib.L3Regularizer
if run with module_params=keras.regularizers
, unless overridden by custom_object_scope
. I think this might be the cleanest final solution for handling a separate contrib repo, some included contrib repo, and custom objects in user code in a standardized way.
We could also add some custom handling to get_from_module to recognize relative and absolute paths. This would be my ideal logic for get_from_module:
custom_objects
, which always takes precedence and can be used to override existing code/
, try to import a fully qualified objectThis way:
get_config
and everything should work.get_config
.custom_objects
to provide clean, simple names for your objects.Does that logic for get_from_module make sense? It would be back-compatible with existing code and would make everything nice and flexible. Wouldn't be too complicated to put together.
Cheers
"Kextras"
@fchollet Since we're not adding much to the repo at this stage (in terms of layers, loss functions, callbacks, etc.), we've talked quite a bit about an external repo for user/additional contributions, that, based on utility and usage may eventually make it into the core.
I'm curious about how you would want/like to go about this. An external repo is not a big deal, but I'm wondering:
Just looking for thoughts on this topic. I've seen a repo or two that are not forks (completely external) and overwrite certain Keras files then do a setup.py install, but that seems like a poor way to do it. A fork would keep the chain of contributions in tact and would be easy to merge, but having any reference to a fork would just be confusing in my opinion.