Closed Galileo-Galilei closed 2 years ago
I like this idea! Let me bring it up with the team
I also like this idea but am a little bit unsure about implementation. Thinking out loud here...
As you point out, we need to make it a bit more general than just allowing new starter names to be added, since you will need to accept different _STARTER_REPO
paths as well. Since all of the logic to do with different starters is really just working out the right values of directory
, template_path
and checkout
, let's make this as general as possible so that users can specify their own starters however they like rather than just a single custom repo URL. e.g. you could have one starter with alias my_starter-master
and another with alias my_starter-develop
that points to a different branch or even a different repo.
The default _STARTER_ALIASES
would then become something like this. Maybe there's also space for a description
or docs
that appears when you do kedro starter list
?
_STARTER_ALIASES = {
"astro-iris": {"template_path": "git+https://github.com/quantumblacklabs/kedro-starters.git", "directory": "astro-iris"}
"mini-kedro": {"template_path": "git+https://github.com/quantumblacklabs/kedro-starters.git", "directory": "mini-kedro"}
...
}
So far so good, but what I don't understand and hopefully @Galileo-Galilei can explain is how the user extends the _STARTERS_ALIASES
dictionary through a plugin? Effectively you want to do:
_STARTERS_ALIASES.update({"my_starter": {"template_path": "blahblahblah", directory="blah"}})
... but I don't see how that change propagates through for when you do kedro new
. Would you need to overwrite the whole new
command on the CLI to take in your modified _STARTERS_ALIASES
or is there a nicer way? How are you changing CONF_ROOT
from a plugin? Feel like I'm missing something here!
TBH, I haven't thought in details about the exact implementation. Actually I want to do exactly what you describe above (updating the _STARTER_ALIASES
dictionary, provded you change the actual implementation to a dictionary).
My naive idea was that you might have a kedro.constants
entrypoint:
setup(entry_points={"kedro.constants": ["plugin_name = plugin_name.plugin_constants:KEDRO_CONSTANTS"]})
and in a plugin_constants.py
:
KEDRO_CONSTANTS={
_STARTER_ALIASES: {"my_starter": {"template_path": "blahblahblah", directory="blah"},
CONF_ROOT=[path/to/conf]
}
Which would enable you to access these constants when loading the entrypoints and override some of the constants.
Honestly, this feels quite hardcoded and not very flexible. Maybe dynaconf
has the possibility to override settings.py
(in a way similar to kedro environments) and we could create a settings.py
file in the plugin with these constants? I am not very familiar with the mechanism so I can't quite tell if it is easy to do.
P.S: Specifically for CONF_ROOT, i think that in general it is more flexible to inject it at runtime through the CLI as suggestedin your user research on configuration, something like kedro run --conf-root=/home/ivan/my_new_conf/
My feeling is that things like CONF_ROOT
that are in settings.py probably don't belong in an entrypoint. As I understand it, we are deliberately moving from the system of registration hooks (like register_catalog
) to instead specify library components in settings.py so that you can't modify these things through a pip install
. So I'm not sure it would make sense for a plugin to be able to modify variables like CONF_ROOT
.
_STARTER_ALIASES
is obviously a bit different though, given that it needs to be specified outside a kedro project. For this I don't immediately see another way to achieve it other than via entrypoints like you suggest.
Hi @AntonyMilneQB, just to clarify, I don't think CONF_ROOT
belongs to an entrypoint neither (actually, all project-related constants should be declared in settings.py
: plugins can already access/supercharge them if necessary juste by importing them if they want).
The hacky entrypoint solution is only necessery for global constants (like _STARTER_ALIASES
) which need to be modified before the project is created. To be honest, I don't see any other use case for now apart from updating this starter dictionnary.
P.S: I really like the settings.py
refactoring, I think it is both more secure (not executing random code on install) and more modular (basically, it seems that all the AbstractDataset
s and the ConfigLoader
could be in separated librairies since they are now completely independant from the rest of the framework)
@Galileo-Galilei Cool, thanks very much for your comments. Completely agree with what you say, and pleased you like the new settings.py
system 👍 Let's see if we can make extendible starters happen.
Linking here some alternative thinking on starter discoverability before we get tied up to a single solution/idea: https://github.com/quantumblacklabs/kedro-starters/pull/40#pullrequestreview-712884148
@lorenabalan I've read the discussion and I think that browsing github to find kedro-starters among community-developped examples may be a good idea, but comes with some caveats to keep in mind:
Description
We have at work a few kedro starters templates which enable to create kedro projects with a bunch of internal configuration (pre-commit, gitlab-ci, custom ConfigLoader...). We also have a custom plugin which modifies kedro's internal (especially the CONF_ROOT constant and the CLI).
When we distribute our template insisde the organisation, we have to give the entire VCS path to the project to our user, and it is not very convenient. I wish we were able to decalre a new starter-alias in the plugin, and when the plugin is installed and auto-discovered by kedro, users can user:
kedro new --starter=my_starter
wheremy_starter
VSC path was declared in the plugin, maybe like this:Context
The 2 main benefits I expect are:
Possible Implementation
Add a
_PLUGIN_STARTER_ALIAS
constant in plugins which will be user to extend the https://github.com/quantumblacklabs/kedro/blob/bce62c824a4786e7cf9cc355b31840879ec851bf/kedro/framework/cli/starters.py#L31-L38 set.It likely needs a bit more refactoring to make it a
dict
accepting different _STARTERS_REPO path which is currently a hardcoded path to a single one github repo.