Closed Nishikoh closed 1 month ago
Can you ellaborate? I understand you would like to test pipelines locally but depending on the resources required to train that model you need a production-ready kubernetes cluster. If you can provide code snippets, configuration files, or else it will help us understand the problem.
The example above is an example of object detection using YOLOX. In the production environment, I will be using Vertex AI pipelines for training, but first I want to make sure that the components work as intended in the local environment. As the local execution is a behaviour check, I run with small datasets and a small number of epochs. Full size datasets and epochs will be run on Vertex AI pipelines for the production environment.
I will give an example that is difficult to prepare as a code snippet for YOLOX above, but requires CUDA for execution. This requires an option to be GPU aware at runtime.
from kfp import dsl, local
local.init(runner=local.DockerRunner())
@dsl.container_component
def gpu_processing():
return dsl.ContainerSpec(
image="gcr.io/google_containers/cuda-vector-add:v0.1",
)
task = gpu_processing()
When I run it, it does not detect CUDA and gives me an error.
02:21:39.615 - INFO - Executing task 'gpu-processing'
02:21:39.615 - INFO - Streamed logs:
Pulling image 'gcr.io/google_containers/cuda-vector-add:v0.1'
Image pull complete
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
[Vector addition of 50000 elements]
I expect the following results.
10:43:49.816 - INFO - Executing task 'gpu-processing'
10:43:49.816 - INFO - Streamed logs:
Found image 'gcr.io/google_containers/cuda-vector-add:v0.1'
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
10:43:51.690 - INFO - Task 'gpu-processing' finished with status SUCCESS
10:43:51.691 - INFO - Task 'gpu-processing' has no outputs
If the user can configure the Docker runtime options, the results will be as expected.
Are you following this method to execute the pipeline? https://www.kubeflow.org/docs/components/pipelines/v2/local-execution/
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
Feature Area
/area sdk
What feature would you like to see?
Set runtime options in kwargs when using Docker with kfp local.
What is the use case or pain point?
Currently, no options can be set at runtime. For example, running a machine learning training task may result in an error due to the small size of shm. To solve this, we would like to be able to set options at runtime.
Here is an example of an Error log.
Is there a workaround currently?
I have no idea.
Love this idea? Give it a đź‘Ť.