wandb / wandb

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
https://wandb.ai
MIT License
8.96k stars 661 forks source link

[CLI] artifact dataset versioning breaks on windows #3133

Open fgraffitti-cyberhawk opened 2 years ago

fgraffitti-cyberhawk commented 2 years ago

Description This issue is linked to https://github.com/wandb/client/issues/2859 Dataset versioning doesn't work when run on windows.

Wandb features To version dataset:

import wandb
run = wandb.init(name='check_error',project='version_dataset')
artifact = wandb.Artifact('check_error', type='dataset')
artifact.add_reference('s3://ai-wandb-error/a/b')
run.log_artifact(artifact)

To download dataset:

artifact = run.use_artifact('check_error/version_dataset/check_error:v0', type='dataset')
artifact_dir = artifact.download()

How to reproduce

  1. Upload a file in a s3 bucket's subfolder
  2. Run the first script above from windows
  3. Run the second script (either on windows or linux): a boto3 error is raised as described in the linked issue
    ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

Running the first script on linux works, and the dataset is correctly downloaded. By checking the wandb browser interface, the issue seems to be linked with the last backslash in the URI. When versioning from windows, this is what the URI looks like on wandb:

image

while when versioning from linux, this is the output:

image

I believe this is a bug, but it would be great to have some insight on this (and also to know if there is a workaround to rename the URI in the wandb artifact so that they match the ones in s3).

Environment

ramit-wandb commented 2 years ago

Hi @fgraffitti-cyberhawk,

Thank you for pointing this out! This is a known issue, and it is planned to be fixed in an upcoming release.

I don't think there is a good way to edit the URI to the Artifact in S3. The best method currently would be to use a library like boto3 to make an API call to S3 and retrieve the object.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity.