Open jonburdo opened 2 years ago
We have an alternative in the form of Label Studio format.
Suggest to close this.
This suggestion is a little different. In label studio format, the annotation content is inside this file. This is a proposition to specify all files as paths, including annotations. So this could be used with any other indexing format.
In label studio you might have data.json
with:
[
{
"annotations": [
{
"class": "cat"
}
],
"data": {
"image": "https://remote.ldb.ai/data-lakes/dogs-and-cats/cat.1000.png"
},
"predictions": []
},
{
"annotations": [
{
"class": "dog"
}
],
"data": {
"image": "https://remote.ldb.ai/data-lakes/dogs-and-cats/dog.1000.png"
},
"predictions": []
}
]
In this suggestion, you might have data.json
:
[
"https://remote.ldb.ai/data-lakes/dogs-and-cats/cat.1000.png",
"https://remote.ldb.ai/data-lakes/dogs-and-cats/cat.1000.json",
"https://remote.ldb.ai/data-lakes/dogs-and-cats/dog.1000.png",
"https://remote.ldb.ai/data-lakes/dogs-and-cats/dog.1000.json"
]
and index with ldb index --format pairs --spec-file data.json
.
The file here provides the file list that ldb otherwise produces by recursively traversing a directory. This is useful for a couple of scenarios:
Essentially this would be a convenient way to avoid passing many individual filepaths as separate args to ldb index
(i.e. similar to how we can use a requirements.txt
file with pip
)
We should add an option (
--spec-file
or something) which takes a filepath (any local or remote urlpath) for a single file containing a list of urlpaths to index, and optionally some other information (in yaml or toml maybe), such as:This way a user could index a specific set of files instead of an entire directory, and this makes it easy to index files hosted at http endpoints: