Open OlegPodlipalin opened 3 months ago
Thanks @OlegPodlipalin for opening this issue.
@deepyaman, @bpmeek and others were discussing about something like this a few days ago https://linen-slack.kedro.org/t/19613568/hey-everyone-a-while-ago-i-took-part-in-a-kedro-user-group-a#ceca4bdb-8037-4ec3-80e3-7eee0476bc6b and in fact @bpmeek open sourced https://github.com/bpmeek/Kedro-Universal-Catalog prompted by some past discussions.
Something similar was already proposed in https://github.com/kedro-org/kedro/issues/1239 and probably in other places.
We will add this to our backlog and discuss it shortly.
@datajoely I quickly scanned https://github.com/kedro-org/kedro/issues/3094 and I didn't see any mentions to config, do you have any extra insight? Also @DimedS @ankatiyar do you think this would help with, say, Airflow deployments?
Yes, IIRC we mention how the introduction of --conf-source
is already a good step in the right direction, it was much worse before!
I actually really want this to happen, we actually support this kind of pattern in the micropackaging
piece we're about to remove... so we should just steal the already written implementation.
@datajoely I quickly scanned #3094 and I didn't see any mentions to config, do you have any extra insight? Also @DimedS @ankatiyar do you think this would help with, say, Airflow deployments?
Yes, it will be quite useful for Airflow deployment for the same reason. It won't be necessary to pack the configuration into the same container if you can retrieve it from remote storage. However, I'm trying to understand how this will increase security. Could you please comment on this, @OlegPodlipalin? If someone gains access to the container, wouldn't they be able to download all configurations stored in cloud services?
I'm trying to understand how this will increase security. Could you please comment on this, @OlegPodlipalin?
@DimedS It will increase security while adding an on-prem solution to existing SaaS. I our case we plan to mirror our ECR with all it's images to a client's infrastructure. The intension is not to build another set of images. The configuration for a new on-prem client will be created separately anyways and it will be the only set available for them, whereas all the rest of the configurations (for all the SaaS) will be stored in our s3.
@OlegPodlipalin, thank you for the issue created! We've discussed it and we think is a good idea, so we would like to move forward with it. Do you want to create a PR?
@ElenaKhaustova I'm ready to get started on this. I'll fork the repo and work on the solution. For accessing the cloud with the conf folder, I'm thinking of using environment variables for credentials.
Description
Currently, we deploy Kedro projects as part of our SaaS solutions within Docker containers, managed by a Prefect server on AWS ECS. These containers are configured to handle all possible configurations for various tenants. However, as we transition to an on-prem solution, we face a challenge where exposing all tenant configurations within a single container poses security and relevance issues since the client will have access to configurations that are not pertinent to them.
We're looking for a way to modularize configuration management without exposing unnecessary data. While Kedro supports cloud storage in the data catalog via the "protocol" option with the
fsspec
library, it does not support a similar mechanism for the configuration source. Enhancing this functionality would significantly streamline the deployment process for our on-prem solutions.Context
This change is crucial for us as it would allow for more secure and relevant configurations per tenant, without the risk of exposing data not intended for them. Implementing this feature would also align configuration management with how data sources are currently handled, making our systems more robust and flexible. This feature could benefit other users who require more granular control over configuration management in environments where host file systems are not a viable option.
Possible Implementation
A potential implementation could involve extending the existing functionality of Kedro's OmegaConfLoader to recognize and handle various storage protocols using the
_parse_filepath
function from kedro/io/core.py lines 690-728:integrating this logic into the OmegaConfLoader kedro/config/omegaconf_config.py lines 127-138 under the
else
statement, for example:will enable the use of
fsspec
protocols for configuration files would allow configurations to be fetched dynamically from specified storages, whether they are local filesystems or cloud-based. Credentials in this case can probably be provided via environmental variables, which are set up during the starting of the container.