substratusai / images

Official Substratus Container Images
1 stars 0 forks source link

add dataset-http-loader #8

Closed samos123 closed 1 year ago

samos123 commented 1 year ago

Fixes #5

samos123 commented 1 year ago

Edit: This has been resolved, was a bug in controller

Weird, the dataset loader pod isn't getting any environment variables when using params. Dataset spec:

apiVersion: substratus.ai/v1
kind: Dataset
metadata:
  name: k8s-instructions
spec:
  params:
    urls: https://huggingface.co/datasets/substratusai/k8s-instructions/raw/main/k8s-instructions.jsonl
  image:
    git:
      url: https://github.com/substratusai/images
      path: dataset-loader-http
      branch: dataset-http-loader

Error in the pod:

ValueError                                Traceback (most recent call last)                                                    
Cell In[2], line 3                                                                                                             
      1 urls = os.environ.get("PARAM_URLS")                                                                                    
      2 if not urls:                                                                                                           
----> 3     raise ValueError("Missing required environment variable PARAM_URLS. "                                              
      4                      "For example, set `spec.params: {urls: http://s.com/dataset.jsonl}` "                             
      5                      "in the Dataset resource")                                                                        
      7 urls = urls.strip().split(",")                                                                                         
      8 urls                                                                                                                   

ValueError: Missing required environment variable PARAM_URLS. For example, set `spec.params: {urls: http://s.com/dataset.jsonl}
` in the Dataset resource     

Looking at the pod spec there are no environment variables set:

  load:                                                                                                                        
    Container ID:   containerd://923dd119853ec66b31f2584829eba3df5b54f8415cd109b865ecaa519a03a807                              
    Image:          us-central1-docker.pkg.dev/sam-argolis/substratus/substratus-dataset-default-k8s-instructions              
    Image ID:       us-central1-docker.pkg.dev/sam-argolis/substratus/substratus-dataset-default-k8s-instructions@sha256:cac918
196e7e2bf37b2ba13ebb3e88e5416ad378c54076e0ff8eeddbf129da9a                                                                     
    Port:           <none>                                                                                                     
    Host Port:      <none>                                                                                                     
    State:          Terminated                                                                                                 
      Reason:       Error                                                                                                      
      Exit Code:    1                                                                                                          
      Started:      Fri, 21 Jul 2023 22:22:42 -0700                                                                            
      Finished:     Fri, 21 Jul 2023 22:22:45 -0700                                                                            
    Ready:          False                                                                                                      
    Restart Count:  0                                                                                                          
    Requests:                                                                                                                  
      cpu:        2     
      memory:     4Gi                                                                                                          
    Environment:  <none>                                       
    Mounts:                                                    
      /content/data from dataset (rw,path="d2ef7bd1e58854a4276474790f921613/data")
      /content/logs from dataset (rw,path="d2ef7bd1e58854a4276474790f921613/logs")
      /content/params.json from params (rw,path="params.json")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mrmhf (ro)