zenml-io / zenml

ZenML 🙏: The bridge between ML and Ops. https://zenml.io.
https://zenml.io
Apache License 2.0
4.07k stars 437 forks source link

[BUG] CSV Datasource error #13

Closed bonacciog closed 3 years ago

bonacciog commented 3 years ago

Describe the bug I'm not able to getting started with quickstart example pipeline. Trying to run: ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv')

To Reproduce I have followed QuickStart steps:

  1. pip install zenml
  2. zenml init
  3. Run the QuickStart example

Screenshots

Schermata 2021-01-15 alle 14 56 56

Stack Trace

KeyError Traceback (most recent call last)

in 1 # Add a datasource. This will automatically track and version it. ----> 2 ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv') 3 training_pipeline.add_datasource(ds) /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/csv_datasource.py in __init__(self, name, path, schema, **unused_kwargs) 45 schema (str): optional schema for data to conform to. 46 """ ---> 47 super().__init__(name, schema, **unused_kwargs) 48 self.path = path 49 /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/base_datasource.py in __init__(self, name, schema, _id, _source, *args, **kwargs) 61 else: 62 # If none, then this is assumed to be 'new'. Check dupes. ---> 63 all_names = Repository.get_instance().get_datasource_names() 64 if any(d == name for d in all_names): 65 raise Exception( /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/repo/repo.py in get_datasource_names(self) 236 c = yaml_utils.read_yaml(file_path) 237 n.append(c[keys.GlobalKeys.DATASOURCE][keys.DatasourceKeys.NAME]) --> 238 return list(set(n)) 239 240 @track(event=GET_DATASOURCES) KeyError: 'datasource' ** Context (please complete the following information):** - OS: MacOS Big Sur 11.1 - Python Version: 3.8.2 - ZenML Version: 0.1.3
hamzamaiot commented 3 years ago

Thanks for reporting @bonacciog ! Could you please verify that your pipelines_dir does not contain any YAML files from older runs of ZenML. This might be due to a non-backwards compatible upgrade of the YAML standard from 0.1.2 to 0.1.3

bonacciog commented 3 years ago

@hamzamaiot Thanks for your reply!

Schermata 2021-01-15 alle 15 08 40

It does't seem there is an YAML file. Is this screenshot enough?

hamzamaiot commented 3 years ago

Could you check in the pipelines directory?

ls pipelines/
bonacciog commented 3 years ago
Schermata 2021-01-15 alle 15 32 31
hamzamaiot commented 3 years ago

ah yes, I see the bug. The Untitled.ipynb file should not exist in the pipelines directory. Actually, nothing should exist in the pipelines directory except the pipeline YAML configurations -> Thats a condition we should be catching in the code so thanks for bringing it up! I'll keep this issue open and add a PR to it that helps in catching these errors more elegantly.

Your immediate solution would be to move the Untitled.ipynb out into the root zenml_practise dir and try again. It should hopefully work.

P.S. If you wanted a recommended directory structure, we have one in our docs for reference. Hope it helps!

bonacciog commented 3 years ago

Thank you for your help @hamzamaiot!

I have created that file, so my mistake. Now the dir is organized in this way:

Schermata 2021-01-15 alle 16 35 38

I have created the dir "notebooks" and file QuickStart.ipynb to start with QuickStart example.

Now I got another error:

Schermata 2021-01-15 alle 16 38 05
bonacciog commented 3 years ago

Maybe I have to work outside of subdirectories (notebooks, pipelines..) ? @hamzamaiot

htahir1 commented 3 years ago

Sorry switching accounts. Now the error is clearer - the datasource already exists so you can either fetch it in your script using the repository.get_datasource_by_name() . The repository instance can be fetched by using Repository.get_instance() .

Or to start from scratch just delete the pipelines directory and it will work 👍

bonacciog commented 3 years ago

Thank you! @htahir1 Now it is working fine! My mistake, sorry

htahir1 commented 3 years ago

No problem! We also added a fix to the YAML pipelines dir problem you were facing in https://github.com/maiot-io/zenml/pull/14 . Thanks for the heads-up!