Open brimoor opened 4 years ago
Patch fix here: #185
@benjaminpkane am I correct in saying the only issue with Sample.to_dict()
is that the id is missing?
import fiftyone as fo
from pprint import pprint
import json
from bson import json_util
dataset = fo.Dataset("cifar10-test")
dataset.add_sample(
fo.Sample(
filepath='/Users/tylerganter/fiftyone/cifar10/test/data/00041.jpg',
tags=['test'],
ground_truth=fo.Classification(label="deer")
)
)
s = dataset.view().first()
pprint(s.to_dict())
pprint(json.loads(json_util.dumps(s.to_mongo())))
OUT:
{'filepath': '/Users/tylerganter/fiftyone/cifar10/test/data/00041.jpg',
'ground_truth': {'_cls': 'Classification', 'label': 'deer'},
'metadata': None,
'tags': ['test']}
{'_id': {'$oid': '5ee23bdaa2cfbd9cc9ae5d82'},
'filepath': '/Users/tylerganter/fiftyone/cifar10/test/data/00041.jpg',
'ground_truth': {'_cls': 'Classification', 'label': 'deer'},
'metadata': None,
'tags': ['test']}
Yes, that is the only thing I saw
Okay. This will be an easy fix. I do have to say though, that I'm not a big fan in general of the paradigm we have for needing to support both user functionality and server functionality. Its almost as if we need a special type of private method that supports server functionality.
I did not think about this hard and deep at all, but I'll just through it out there: What if we have a child class for the server version of everything:
class ServerSample(Sample):
def to_server_dict():
"""serializes for the server"""
...
Such that all of this functionality is hidden from the user.
Why do we need a class? I'm fine with a private method.
Up to you, but the reason I need id's right now is for cache busting images. I could probably change that. Are we hiding the entire notion of an id to the user?
Why do we need a class? I'm fine with a private method.
Up to you, but the reason I need id's right now is for cache busting images. I could probably change that. Are we hiding the entire notion of an id to the user?
The logic I went with, was that Sample.to_from/dict()
are the python dictionary equivalents of to/from_json()
. And:
fo.Sample.from_dict(sample_in_dataset.to_dict())
# new sample, NOT in dataset
fo.Sample.from_json(sample_in_dataset.from_json())
# new sample, NOT in dataset
If we started writing IDs to file we'd start getting a mess where we would have to check for an ID and ignore it if present which means you're not really loading this thing that you have serialized...
Thus I write support for Sample.to_mongo_dict
which does NOT have a corresponding Sample.from_mongo_dict
because the inverse operation would really be:
d = sample.to_mongo_dict()
sample2 = dataset[d["_id"]]
sample2 is sample
# True
While we have been using a mixture of private and public methods on the server, private methods shouldn't be used this way. There is no way for us to distinguish:
fo.Sample._method_that_should_only_be_called_internally_by_Sample()
fo.Sample._method_that_should_only_be_called_by_server()
kibitz: I agree that, in an ideal world, we would avoid _method_that_should_only_be_called_by_server()
. But, that pattern already exists a few dozen times in the codebase, so adding another here and moving onto more pressing things seems right
This PR https://github.com/voxel51/fiftyone/pull/169 updated the behavior of
fiftyone.core.sample.Sample.to_dict()
, which caused the server to have to serialize samples in a non-ideal bson -> str -> json way: https://github.com/voxel51/fiftyone/pull/185/commits/85399782a5aff92f4fafe1799642d9250008926e.The interface needs to be tweaked to ensure that all relevant use cases of sample serialization are supported cleanly.