v7labs / darwin-py

Library and commandline tool for managing datasets on darwin.v7labs.com
MIT License
115 stars 42 forks source link

[DAR-2991][External] Set CLI to pull with folders by default & display overwrite warning #887

Closed JBWilkie closed 1 month ago

JBWilkie commented 2 months ago

Problem

In DAR-2246, we made significant breaking changes to the way darwin-py names files when downloading them. One of those changes was changing the default behaviour of pull() so that it replicates remote folder structures locally by default. However, it was overlooked that CLI-initiated pulls had their own default behaviour, and this was not changed in that PR

This, combined with the naming changes themselves, has meant that CLI-initiated dataset pulls will result in overwritten local files if there are identically named files in different folders in the release

Solution

The warning is in the following style:

Warning: Identical filenames detected in your export release. 

You are pulling a flat release with identically named dataset items. The release will still be pulled, but to prevent overwriting your dataset files, please re-pull the release with the folder structure. This can be done as follows:
- CLI: darwin dataset pull team_slug/dataset_slug --folders
- SDK: dataset.pull(use_folders=True)

The following paths are duplicated:
- {path_1} is duplicated 2 times
- {path_2} is duplicated 2 times

Changelog

linear[bot] commented 2 months ago

DAR-2991 darwin dataset pull overwrites images with same name, unexpected behavioral change between 0.8.x and 1.0.x

JBWilkie commented 1 month ago

nice, looks good! just checking, should there now be a ticket to eventually deprecate --folders since it doesn't actually do anything?

Yep, created DAR-3189 for it