kubeflow / examples

A repository to host extended examples and tutorials
Apache License 2.0
1.39k stars 751 forks source link

[help wanted] Use large files from notebook server in KubeFlow Pipeline #1034

Closed nutmilk10 closed 1 year ago

nutmilk10 commented 1 year ago

I have a kubeflow notebook server that houses a large files (100 GB+) and was trying to read it into my pipeline but I am getting a file not found error. Not sure what's the best step since most tutorials I see download their data which isn't feasible in my case.

`import kfp import kfp.dsl as dsl import kfp.components as comp from kubernetes import config, client from kubernetes.client import CoreV1Api, V1PodList,V1Volume def data_loader(): import pandas as pd import numpy as np import sys data = pd.read_csv('/home/jovyan/workspace/sandbox/data.csv', sep=',',header=None)

data_loading_op = comp.func_to_container_op(data_loader, base_image='tensorflow/tensorflow:1.11.0-py3')

@dsl.pipeline( name='DataLoading Pipeline', description='Test.' ) def phenology_pipeline(): data_loading_task = data_loading_op()`