[X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I get the error FileNotFoundError: Dataset does not exist. Please check the path or dataset_id when trying to load the yfcc-10M-filter-euclidean dataset.
Expected Behavior
The dataset should be loaded as its available within list_datasets().
Steps To Reproduce
from pinecone_datasets import list_datasets, load_dataset
datasets = list_datasets()
dataset_name = "yfcc-10M-filter-euclidean"
assert dataset_name in datasets, "Dataset does not exists!"
dataset = load_dataset(dataset_name)
Relevant log output
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[6], line 1
----> 1 load_dataset('yfcc-10M-filter-euclidean')
File ~/vector_db_benchmark/venv/lib/python3.10/site-packages/pinecone_datasets/public.py:59, in load_dataset(dataset_id, **kwargs)
57 raise FileNotFoundError(f"Dataset {dataset_id} not found in catalog")
58 else:
---> 59 return Dataset.from_catalog(dataset_id, **kwargs)
File ~/vector_db_benchmark/venv/lib/python3.10/site-packages/pinecone_datasets/dataset.py:89, in Dataset.from_catalog(cls, dataset_id, catalog_base_path, **kwargs)
83 catalog_base_path = (
84 catalog_base_path
85 if catalog_base_path
86 else os.environ.get("DATASETS_CATALOG_BASEPATH", cfg.Storage.endpoint)
87 )
88 dataset_path = os.path.join(catalog_base_path, f"{dataset_id}")
---> 89 return cls(dataset_path=dataset_path, **kwargs)
File ~/vector_db_benchmark/venv/lib/python3.10/site-packages/pinecone_datasets/dataset.py:190, in Dataset.__init__(self, dataset_path, **kwargs)
188 self._dataset_path = dataset_path
189 if not self._fs.exists(self._dataset_path):
--> 190 raise FileNotFoundError(
191 "Dataset does not exist. Please check the path or dataset_id"
192 )
193 else:
194 self._fs = None
FileNotFoundError: Dataset does not exist. Please check the path or dataset_id
Is this a new bug?
Current Behavior
I get the error
FileNotFoundError: Dataset does not exist. Please check the path or dataset_id
when trying to load the yfcc-10M-filter-euclidean dataset.Expected Behavior
The dataset should be loaded as its available within
list_datasets()
.Steps To Reproduce
Relevant log output
Environment
Additional Context
Looking at the metadata about the datasets
Results show that the data is not in the
bucket
: