Closed filippocastelli closed 1 month ago
Hi @filippocastelli, thanks for raising this with us!
When we released the major version 1.0.0, we made some breaking changes to how darwin-py names files when pulling & loading data. These changes were made in a bid to improve the coherence of the naming conventions of in-platform and locally downloaded files. One of those changes involved naming items after their in-platform name (item name) instead of the name of the annotation file, which previously was the case.
Another of these changes was to change the default behaviour of RemoteDataset.pull()
to pull with folders, instead of in a flat structure which previously was the case. Unfortunately, there was an oversight and we did not change this behaviour for CLI-initiated pull
operations. This was a mistake and we apologise for the issue. The combination of the above behaivour change and the oversight leads to overwriting.
To rectify it, work has been done on the DAR-2991 branch and a PR has been opened with the following changes:
Because the changes were made to bring greater coherence between the names of in-platform items and local items, unfortunately we won't be reverting to the behaviour that employs _n
suffixes to ensure uniqueness. In advance of the release, emails containing information on the changes were sent to every Darwin team.
You'll be updated as soon the these changes are available in a darwin-py release!
Thank you for the feedback, the reasons for this behavior change are justified.
Hi @filippocastelli The above changes are now available in version 1.0.3 released today. pull()
from the CLI will now pull with folders by default. You can still pull a flat structure with the new --no-folders
flag, and a non-blocking warning will be displayed for every file that's going to be overwritten
thank you!
using
darwin dataset pull
without specifying--folders
results in missing image files when multiple remote files share the same filename.Please notice that this is the same issue as #603 , which was solved somewhere around
0.8.44
and most likely reintroduced by #872 .This unexpected behavioural change on core features of the package like dataset pulling is very disruptive for customer workflows depending on
darwin-py
.below the steps to reproduce