Open daavoo opened 2 years ago
The way I currently implemented is by using
--pipe
and a python "constant" as source of truth.
Edited I ended up handling this via annotation field named split
.
The current split workflow involves the use of queries (to separate by JSON field, --limit (by count), or --tag (by tags).
We might want to give explicit examples. Also one idea is to address multi-stage splits by recipes.
In many practical machine learning workflows, splitting a dataset into subsets is a common operation.
For example, in the data-centric ai competition 2 different splits (train, validation) are expected to be submitted. Different strategies for generating those splits might be tried and I would expect LDB to support these iterations.
I didn't find any guidance on how to perform these splitting iterations as part of the LDB workflow. Does the recommended workflow depend on https://github.com/iterative/ldb/issues/88 ?