Closed HumairAK closed 3 years ago
@durandom please let me know if I have missed anything.
Also @erikerlandson @caldeirav @redmikhail
I believe we may want to maintain the current AWS cluster as a "production" cluster, per discussion with @caldeirav
If so, I suggest to either adopt or re-deploy it from the operate first ACM instance and manage the deployment via overlays in the apps repo.
The Allianz workflow requires GPUs to run, so for this to be a viable dev environment, the zero cluster will need GPU resources adequate to this use case. cc @ChristianMeyndt
Good point and good opportunities for either remove the dependency for GPUs or add GPUs to the new environment.
Some requirements from developer user point of view: 1) Single dashboard with access to all the components / tools (JupyterHub, Superset, Trino, etc...) 2) All accesses should be governed by GitHub authentication 3) Having an end-to-end guide covering the development using a GitOps approach i.e. take elyra-aidevsecops-tutorial + management of data lineage
As far as I understand one of the by-products of the OS-C project was to develop and test set of tools that could be portable to environments run by current or future members of the OS-C community and potentially used in on-prem environments - @caldeirav correct me if I am wrong. If it is still one of the goals, running development environment as a service and have it managed by ACM may potentially will increase amount of dependencies . There are also questions of onboarding and granting access to the members of OS-C community and that we are still evaluating some of the tools that could become part of the platform but may not be yet added to the Operate First environment. AWS based environment has also benefits of adding and removing resources on demand until we have clear understanding of capacity that we actually need . The original thought process was to try to try to re-use as much as possible from already existing GitOps tools and processes from Operate First project while keeping it isolated and portable . If it is not something that we need to worry about running OS-C in well-developed and maintained environment makes absolute sense.
As far as I understand one of the by-products of the OS-C project was to develop and test set of tools that could be portable to environments run by current or future members of the OS-C community and potentially used in on-prem environments - @caldeirav correct me if I am wrong. If it is still one of the goals, running development environment as a service and have it managed by ACM may potentially will increase amount of dependencies . There are also questions of onboarding and granting access to the members of OS-C community and that we are still evaluating some of the tools that could become part of the platform but may not be yet added to the Operate First environment. AWS based environment has also benefits of adding and removing resources on demand until we have clear understanding of capacity that we actually need . The original thought process was to try to try to re-use as much as possible from already existing GitOps tools and processes from Operate First project while keeping it isolated and portable . If it is not something that we need to worry about running OS-C in well-developed and maintained environment makes absolute sense.
Yes this is why I see Operate First as potentially a "master controller" environment for OS-C. We can drive deployments from ACM on multiple infrastructure, including the environment we need to have for R&D on data commons platform, the environment where the ITR / NLP / Physical Risk data science teams are doing development, and in the future there could be other streams / organisations running from the same platform architecture / shared code but operating on different private or public clouds. We should have a discussion on this to get into some details and maybe create a target picture so we can validate this with the members and the TAC. We can also discuss the portability of this setup for potential on-premises setup which it seems may not be a key use-case at this point.
This has been resolved, OSC and the data platform on it is now being managed by operate-first acm/argocd, and is configured via the apps repo making use of the same configs
This issue to identify the tools/services that the os climate will need that can be provisioned via Operate-First, instead of redeploying it in a separate cluster.
I initially deployed ODH (Superset/Trino/JH) on the OS cluster but the same tools are deployed on the Operate First Zero cluster, and it seems like it may be more beneficial if instead to just leverage that instead.
A huge benefit is that we only need to then manage one set of deployments. The gitops framework is already configured for operate-first, so we also don't need to do things like re deploy / configure argocd, we can just use the one we currently manage.
There is an aws cluster for osc which I'm not sure what we should do with. If it's still needed, we can configure the operate-first ACM instance to manage it instead so that too can be brought / maintained under the operate-first umbrella.
One thing that cannot be offered via operate-first is storage, external s3 storage will still need to be provided by os-c.
What exists today in operate first that can be used by OS-C:
Anything more we can add to our clusters, and if needed we can deploy it to the osc cluster (after it's managed via acm?).