[BUG] Fields in embedded docs with mixed types crash field resolution in fiftyone.core.odm.utils._merge_embedded_doc_fields

gabmis commented 3 months ago

Describe the problem

It seems the problem (TypeError) occurs when you have fields with mixed types in an embedded document. E.g. detections with a user defined field named "size" which has both ints and floats. I don't know the expected behavior here but the way the code fails seems unsatisfactory in any case, in informative error could be the solution. The code responsible is the function mentioned in the title and in particular this line. which sets the field to None in the fields_dict when seeing a type mismatch. Then, when the field comes up again (e.g. when parsing the subsequent detection) this line will try to perform __getitem__("size") on a NoneType.

Code to reproduce issue

I don't have the time right now but if you think it necessary I'll take the time later to provide a minimal example.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 22.04): Ubuntu 22.04
Python version (python --version): 3.11.8
FiftyOne version (fiftyone --version): 0.23.5
FiftyOne installed from (pip or source): pip

Other info/logs

N/A

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

[ ] Yes. I can contribute a fix for this bug independently
[ ] Yes. I would be willing to contribute a fix for this bug with guidance from the FiftyOne community
[x] No. I cannot contribute a bug fix at this time

brimoor commented 3 months ago

@gabmis can you please provide the minimal example so we can see what you're trying to accomplish? Not yet sure how to proceed 😄

gabmis commented 2 months ago

@brimoor thanks for following up and sorry for the late response.

Unfortunately, I haven't found the time to create the example. I can describe it shortly though.

I had a field containing fo.Detections which themselves had a confidence field that took both int and float values across my dataset. This lead to an error when I was adding my samples to my dataset and the error was the one mentioned above.

Ideally, an error message telling me the specific field causing the issue and explaining the fact that the user is responsible for ensuring type consistency.

Something similar is discussed in the Dyanmic Attribute documentation which I'm guessing is related.

Sorry I can't do more to help, feel free to close if the current information I've provided is not actionable enough.

voxel51 / fiftyone