This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.
Apache License 2.0
11
stars
25
forks
source link
Reorganize README to split container infra from pipeline construction #200
I am trying to use the latest documentation to guide how to create a pipeline for the (still private) PCAF sovereign footprint POC. I appreciate that the AICoE demo is trying to address two audiences: those who are building the actual containers that will run the jobs, as well as those who are building the notebooks that need to use those containers, but which are much more concerned with the calculations within the notebooks and the topology of the notebooks, without so much concern for the underlying infrastructure.
For example, when I select Custom Elyra Notebook or AICoE Demo as a notebook type, how much of the infrastructure decisions can I expect to have already been made by that selection, requiring me to only make simple GUI-based selections within a constrained environment? And how much do I need to grovel in the details of copy-pasting and editing every line of a Dockerfile to get the right sort of "Hello, world" pipeline functionality?
Following along the demo video (https://www.youtube.com/watch?v=lGeT615YNlM) I do see that users must create both a YAML file and a Docker image to define the container image. When the demo shows the construction of pipelines, it does not mention how much additional work is needed behind the scenes to make the demo2 notebooks magically link up with all that the YAML file and Dockerfile imply. For a Jupyter notebook user, it does not explain how to even edit /opt/app-root/src/PCAF-sovereign-footprint/.aicoe-ci.yaml, which is a hidden file that the file browser cannot even open.
In the part of the video that shows how runtime images are selected (https://youtu.be/lGeT615YNlM?t=701) there is no mention of how to find the quay.io server, nor any explanation as to the relationship between what a project should magically inherit as a result of the AICoE template nor any OperateFirst instance values for projects that are part of an Op1st environment (such as os-climate). The requirement that os-climate needs to create a redhat.com account to access quay.io repositories is confusing as an ODH user in a different organization. (The readme does offer the name of a quay.io image that does take me to the right place, but that's buried way past where I run into trouble trying to follow other directions first.)
I tried using the default https://ml-pipeline-ui.kubeflow.svc.cluster.local:80/pipeline advertised by the documentation, but that did not work. I interpolated a different endpoint by scraping what is in the demo video browser URL and changing CL1 to CL2:
http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline but that gave this error message:
Error making request
Failed to initialize `kfp.Client()` against: 'http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline' - Check Kubeflow Pipelines runtime configuration: 'pcaf_kubeflow'
Error details:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/kfp/processor_kfp.py", line 123, in process
client = TektonClient(
File "/opt/app-root/lib64/python3.8/site-packages/kfp/_client.py", line 161, in __init__
if not self._context_setting['namespace'] and self.get_kfp_healthz().multi_user is True:
File "/opt/app-root/lib64/python3.8/site-packages/kfp/_client.py", line 363, in get_kfp_healthz
raise TimeoutError('Failed getting healthz endpoint after {} attempts.'.format(max_attempts))
TimeoutError: Failed getting healthz endpoint after 5 attempts.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.8/site-packages/tornado/web.py", line 1704, in _execute
result = await result
File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/handlers.py", line 120, in post
response = await PipelineProcessorManager.instance().process(pipeline)
File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/processor.py", line 134, in process
res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
File "/usr/lib64/python3.8/asyncio/futures.py", line 260, in __await__
yield self # This tells Task to wait for completion.
File "/usr/lib64/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
File "/usr/lib64/python3.8/asyncio/futures.py", line 178, in result
raise self._exception
File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/app-root/lib64/python3.8/site-packages/elyra/pipeline/kfp/processor_kfp.py", line 148, in process
raise RuntimeError(
RuntimeError: Failed to initialize `kfp.Client()` against: 'http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline' - Check Kubeflow Pipelines runtime configuration: 'pcaf_kubeflow'
Check the JupyterLab log for more details at 2022-08-26 09:39:48
@schwesig: The label(s) kind/bug cannot be applied, because the repository doesn't have them.
In response to [this](https://github.com/os-climate/aicoe-osc-demo/issues/200#issuecomment-1228648231):
>/kind bug
Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
I am trying to use the latest documentation to guide how to create a pipeline for the (still private) PCAF sovereign footprint POC. I appreciate that the AICoE demo is trying to address two audiences: those who are building the actual containers that will run the jobs, as well as those who are building the notebooks that need to use those containers, but which are much more concerned with the calculations within the notebooks and the topology of the notebooks, without so much concern for the underlying infrastructure.
For example, when I select
Custom Elyra Notebook
orAICoE Demo
as a notebook type, how much of the infrastructure decisions can I expect to have already been made by that selection, requiring me to only make simple GUI-based selections within a constrained environment? And how much do I need to grovel in the details of copy-pasting and editing every line of a Dockerfile to get the right sort of "Hello, world" pipeline functionality?Following along the demo video (https://www.youtube.com/watch?v=lGeT615YNlM) I do see that users must create both a YAML file and a Docker image to define the container image. When the demo shows the construction of pipelines, it does not mention how much additional work is needed behind the scenes to make the demo2 notebooks magically link up with all that the YAML file and Dockerfile imply. For a Jupyter notebook user, it does not explain how to even edit /opt/app-root/src/PCAF-sovereign-footprint/.aicoe-ci.yaml, which is a hidden file that the file browser cannot even open.
In the part of the video that shows how runtime images are selected (https://youtu.be/lGeT615YNlM?t=701) there is no mention of how to find the quay.io server, nor any explanation as to the relationship between what a project should magically inherit as a result of the AICoE template nor any OperateFirst instance values for projects that are part of an Op1st environment (such as os-climate). The requirement that os-climate needs to create a redhat.com account to access quay.io repositories is confusing as an ODH user in a different organization. (The readme does offer the name of a quay.io image that does take me to the right place, but that's buried way past where I run into trouble trying to follow other directions first.)
I tried using the default
https://ml-pipeline-ui.kubeflow.svc.cluster.local:80/pipeline
advertised by the documentation, but that did not work. I interpolated a different endpoint by scraping what is in the demo video browser URL and changing CL1 to CL2:http://ml-pipeline-ui.kubeflow.apps.odh-cl2.apps.os-climate.org/pipeline
but that gave this error message:Happy to try again with some guidance.