holoviz-topics / EarthML

Tools for working with machine learning in earth science
https://earthml.holoviz.org
BSD 3-Clause "New" or "Revised" License
94 stars 21 forks source link

01_Data_Ingestion intake read error #98

Open thomasastanley opened 4 years ago

thomasastanley commented 4 years ago

training = intake.open_csv('../data/landsat*_training.csv') worked fine, but

training = intake.open_csv('../data/landsat{version:d}_training.csv')
training_df = training.read()
training_df.head()

produced a value error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-6ade6c87b33e> in <module>
      1 training = intake.open_csv('../data/landsat{version:d}_training.csv')
----> 2 training_df = training.read()
      3 training_df.head()

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in read(self)
    140 
    141     def read(self):
--> 142         self._get_schema()
    143         return self._dataframe.compute()
    144 

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _get_schema(self)
    125 
    126         if self._dataframe is None:
--> 127             self._open_dataset(urlpath)
    128 
    129         dtypes = self._dataframe._meta.dtypes.to_dict()

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _open_dataset(self, urlpath)
    116 
    117         # add the new columns to the dataframe
--> 118         self._set_pattern_columns(path_column)
    119 
    120         if drop_path_column:

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\csv.py in _set_pattern_columns(self, path_column)
     73             col.cat.codes.map(dict(enumerate(values))).astype(
     74                 "category" if not _HAS_CDT else CategoricalDtype(set(values))
---> 75             ) for field, values in reverse_formats(self.pattern, paths).items()
     76         }
     77         self._dataframe = self._dataframe.assign(**column_by_field)

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in reverse_formats(format_string, resolved_strings)
    126     args = {field_name: [] for field_name in field_names}
    127     for resolved_string in resolved_strings:
--> 128         for field, value in reverse_format(format_string, resolved_string).items():
    129             args[field].append(value)
    130 

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in reverse_format(format_string, resolved_string)
    193 
    194     # get a list of the parts that matter
--> 195     bits = _get_parts_of_format_string(resolved_string, literal_texts, format_specs)
    196 
    197     for i, (field_name, format_spec) in enumerate(zip(field_names, format_specs)):

C:\ProgramData\Anaconda3\envs\earthml\lib\site-packages\intake\source\utils.py in _get_parts_of_format_string(resolved_string, literal_texts, format_specs)
     41             if literal_text not in _text:
     42                 raise ValueError(("Resolved string must match pattern. "
---> 43                                   "'{}' not found.".format(literal_text)))
     44             bit, _text = _text.split(literal_text, 1)
     45             if bit:

ValueError: Resolved string must match pattern. '../data/landsat' not found.
jlstevens commented 3 years ago

Thanks for reporting this! I'll try to have a look into what the data path should be shortly...