v7labs / darwin-py

Library and commandline tool for managing datasets on darwin.v7labs.com
MIT License
115 stars 42 forks source link

[DAR-3041][External] Change the behaviour of the `force_slots` argument of `pull()` #915

Closed JBWilkie closed 3 weeks ago

JBWilkie commented 3 weeks ago

Problem

Currently when pulling a dataset release with darwin-py, if:

Then every item is named and structured into directories in the multi-slotted way. That is the paths include the slot name: {prefix}/{item_name}/{slot_name}/{file_name}

This is because if any item meets either condition 1 or 2 above, we set force_slots=True for all items

If a user passes force_slots=False to pull() (which is the default behaviour), we should pass force_slots=False for all single-slotted, single-source-file items, but as True for all other items, since in that case it only makes sense for the local paths of multi-slotted / multi-source-file items to include the slot names

Solution

Instead of passing the same value of force_slots for all items, determine & record the value of force_slots for every item to be downloaded, then propagate these values into the download functions

Changelog

When downloading dataset releases containing a mixture of multi-slotted & single-slotted items, only represent the slots of the multi-slotted items locally

linear[bot] commented 3 weeks ago

DAR-3041 Investigate and potentially change `force_slots` behaviour