v7labs / darwin-py

Library and commandline tool for managing datasets on darwin.v7labs.com
MIT License
115 stars 42 forks source link

Cannot get remote dataset with uppercase name #851

Closed Krystex closed 4 months ago

Krystex commented 4 months ago

Hi all!

Somehow, when I create a dataset with an uppercase word in it, I cannot get it with the get_remote_dataset function.

To reproduce:

from darwin.client import Client
client = Client.local()
client.create_dataset("TEST_dataset")
client.get_remote_dataset("TEST_dataset")

The last line throws the following stacktrace:

    213     return RemoteDatasetV2(
    214         name=dataset["name"],
    215         slug=dataset["slug"],
   (...)
    220         client=self,
    221     )
    222 if not matching_datasets:
--> 223     raise NotFound(str(parsed_dataset_identifier))
    224 if parsed_dataset_identifier.version:
    225     matching_datasets[0].release = parsed_dataset_identifier.version

NotFound: Not found: 'companyname/TEST_dataset'

It works if I try got get the dataset with the lowercase version (client.get_remote_dataset("test_dataset"))

linear[bot] commented 4 months ago

DAR-2304 Cannot get remote dataset with uppercase name

JBWilkie commented 4 months ago

Hello @Krystex, thanks for raising this! The reason you're running into this issue is that get_remote_dataset() expects a dataset_identifier argument of type DatasetIdentifier

This is a string containing the substrings: <team-slug>/<dataset-slug>

Therefore, dataset names passed to the function need to be sluggified, in your case it would be test_dataset

There's an example of how to use get_remote_dataset() in this article

Krystex commented 4 months ago

Thanks for the link! I'm trying it like in the article described, but it still doesn't work:

from darwin.client import Client
client = Client.local()
client.create_dataset("TEST_dataset")
client.get_remote_dataset(f"{client.default_team}/TEST_dataset")

The same error as before is thrown

JBWilkie commented 4 months ago

Hi @Krystex! Apologies for not covering this explicitly previously, but sluggifying a dataset name involves lowercasing it and replacing spaces with hyphens

image

Therefore instead of TEST_dataset, you'll need to pass test_dataset

Krystex commented 4 months ago

Ahh, I didn't know sluggifying is a standard procedure, thanks for linking the library!