polaris-hub / polaris

Foster the development of impactful AI models in drug discovery.
https://polaris-hub.github.io/polaris/
Apache License 2.0
101 stars 6 forks source link

Issue accessing datasets in "The Basics" tutorial #220

Open wvirany opened 1 week ago

wvirany commented 1 week ago

Polaris version

0.9.1

Python Version

3.12.7

Operating System

WSL

Installation

conda install polaris==0.9.1

Description

I am following "The Basics" tutorial. When I try to load the datasets, I get the following error:

{
    "name": "PolarisRetrieveArtifactError",
    "message": "The request to the Polaris Hub failed. The Hub responded with:
{
  \"message\": \"V2 dataset 'polaris/hello-world' not found\"
}
Note: If this artifact exists and you can confirm that you are authorized to retrieve it, please call 'polaris login --overwrite' and try again. If the issue persists, please reach out to the Polaris team for support.",
    "stack": "---------------------------------------------------------------------------
HTTPStatusError                           Traceback (most recent call last)
File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:176, in PolarisHubClient._base_request_to_hub(self, url, method, **kwargs)
    175 try:
--> 176     response.raise_for_status()
    178 except HTTPStatusError as error:

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/httpx/_models.py:763, in Response.raise_for_status(self)
    762 message = message.format(self, error_type=error_type)
--> 763 raise HTTPStatusError(message, request=request, response=self)

HTTPStatusError: Client error '404 Not Found' for url 'https://polarishub.io/api/v1/dataset/polaris/hello-world'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

The above exception was the direct cause of the following exception:

PolarisRetrieveArtifactError              Traceback (most recent call last)
File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:319, in PolarisHubClient.get_dataset(self, owner, name, verify_checksum)
    318 try:
--> 319     return self._get_v1_dataset(owner, name, ArtifactSubtype.STANDARD.value, verify_checksum)
    320 except PolarisRetrieveArtifactError:
    321     # If the v1 dataset is not found, try to load a v2 dataset

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:347, in PolarisHubClient._get_v1_dataset(self, owner, name, artifact_type, verify_checksum)
    342 url = (
    343     f\"/v1/dataset/{owner}/{name}\"
    344     if artifact_type == ArtifactSubtype.STANDARD.value
    345     else f\"/v2/competition/dataset/{owner}/{name}\"
    346 )
--> 347 response = self._base_request_to_hub(url=url, method=\"GET\")
    349 # Disregard the Zarr root in the response. We'll get it from the storage token instead.

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:207, in PolarisHubClient._base_request_to_hub(self, url, method, **kwargs)
    205 if response_status_code == 404:
    206     # This happens when an artifact doesn't exist _or_ when the user has no access to that artifact.
--> 207     raise PolarisRetrieveArtifactError(response=response) from error
    209 raise PolarisHubError(response=response) from error

PolarisRetrieveArtifactError: The request to the Polaris Hub failed. The Hub responded with:
{
  \"message\": \"Dataset 'polaris/hello-world' not found\"
}
Note: If this artifact exists and you can confirm that you are authorized to retrieve it, please call 'polaris login --overwrite' and try again. If the issue persists, please reach out to the Polaris team for support.

During handling of the above exception, another exception occurred:

HTTPStatusError                           Traceback (most recent call last)
File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:176, in PolarisHubClient._base_request_to_hub(self, url, method, **kwargs)
    175 try:
--> 176     response.raise_for_status()
    178 except HTTPStatusError as error:

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/httpx/_models.py:763, in Response.raise_for_status(self)
    762 message = message.format(self, error_type=error_type)
--> 763 raise HTTPStatusError(message, request=request, response=self)

HTTPStatusError: Client error '404 Not Found' for url 'https://polarishub.io/api/v2/dataset/polaris/hello-world'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

The above exception was the direct cause of the following exception:

PolarisRetrieveArtifactError              Traceback (most recent call last)
Cell In[6], line 1
----> 1 dataset = po.load_dataset(\"polaris/hello-world\")
      2 benchmark = po.load_benchmark(\"polaris/hello-world-benchmark\")

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/loader/load.py:39, in load_dataset(path, verify_checksum)
     36 if not is_file:
     37     # Load from the Hub
     38     client = PolarisHubClient()
---> 39     return client.get_dataset(*path.split(\"/\"), verify_checksum=verify_checksum)
     41 # Load from local file
     42 if extension == \"json\":

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:322, in PolarisHubClient.get_dataset(self, owner, name, verify_checksum)
    319     return self._get_v1_dataset(owner, name, ArtifactSubtype.STANDARD.value, verify_checksum)
    320 except PolarisRetrieveArtifactError:
    321     # If the v1 dataset is not found, try to load a v2 dataset
--> 322     return self._get_v2_dataset(owner, name)

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:379, in PolarisHubClient._get_v2_dataset(self, owner, name)
    377 \"\"\"\"\"\"
    378 url = f\"/v2/dataset/{owner}/{name}\"
--> 379 response = self._base_request_to_hub(url=url, method=\"GET\")
    381 # Disregard the Zarr root in the response. We'll get it from the storage token instead.
    382 response.pop(\"zarrRootPath\", None)

File ~/miniconda3/envs/polaris/lib/python3.12/site-packages/polaris/hub/client.py:207, in PolarisHubClient._base_request_to_hub(self, url, method, **kwargs)
    203         raise PolarisCreateArtifactError(response=response) from error
    205     if response_status_code == 404:
    206         # This happens when an artifact doesn't exist _or_ when the user has no access to that artifact.
--> 207         raise PolarisRetrieveArtifactError(response=response) from error
    209     raise PolarisHubError(response=response) from error
    210 # Convert the response to json format if the response contains a 'text' body

PolarisRetrieveArtifactError: The request to the Polaris Hub failed. The Hub responded with:
{
  \"message\": \"V2 dataset 'polaris/hello-world' not found\"
}
Note: If this artifact exists and you can confirm that you are authorized to retrieve it, please call 'polaris login --overwrite' and try again. If the issue persists, please reach out to the Polaris team for support."
}

I then tried to run polaris login --overwrite, but that didn't fix the problem.

Steps to reproduce

To create the conda environment,

conda create -n polaris && conda activate polaris
conda install polaris==0.9.1

Then, I am running the following code in a Jupyter notebook:

import polaris as po
from polaris.hub.client import PolarisHubClient

client = PolarisHubClient()
client.login()

dataset = po.load_dataset("polaris/hello-world")
benchmark = po.load_benchmark("polaris/hello-world-benchmark")

Additional output

No response

wvirany commented 1 week ago

I saw this issue as well, which seems to have had a similar problem on older versions of Polaris / Python. So, I tried running the same code with python==3.10.12 and polaris==0.8.5, as in that resolved issue, but I ran into the same problem.

Thanks for your help!

cwognum commented 1 week ago

Hi @wvirany , thanks for reporting.

Seems we accidentally made the hello-world dataset private. I just made it public again and it should now work. Let me know if the issue persists. Here's the dataset!