voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.03k stars 535 forks source link

[FR] Support Azure Cosmos DB API for MongoDB #2178

Open janbernloehr opened 1 year ago

janbernloehr commented 1 year ago

Instructions

Proposal Summary

Support Azure Cosmos DB API for MongoDB (feature level 4.2) to ease deployment of fiftyone in azure environments.

Motivation

What areas of FiftyOne does this feature affect?

Details

When currently starting fiftyone with the Cosmos DB backend, we get this error:

Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Message: {"Errors":["The index path corresponding to the specified order-by item is excluded."]}
ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e, Request URI: /apps/266260cc-aa5c-4d29-a19b-9549c6eb601b/services/3a28772e-5e64-4457-b594-b12a33e2d177/partitions/a2f83d84-0707-4205-95f6-4ff70271d3f7/replicas/133103169072839565s/, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.19041 cosmos-netstandard-sdk/3.18.0);););, full error: {'ok': 0.0, 'errmsg': 'Error=2, Details=\'Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e; Reason: (Message: {"Errors":["The index path corresponding to the specified order-by item is excluded."]}\r\nActivityId: eec9ddd8-e490-435f-9c8b-1a94b33b607e, Request URI: /apps/266260cc-aa5c-4d29-a19b-9549c6eb601b/services/3a28772e-5e64-4457-b594-b12a33e2d177/partitions/a2f83d84-0707-4205-95f6-4ff70271d3f7/replicas/133103169072839565s/, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.19041 cosmos-netstandard-sdk/3.18.0);););', 'code': 2, 'codeName': 'BadValue'}
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/graphql/execution/execute.py", line 353, in await_result
return await result # type: ignore
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/graphql/execution/execute.py", line 456, in get_results
await gather(*(results[field] for field in awaitable_fields)),
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/graphql/execution/execute.py", line 632, in await_result
self.handle_field_error(error, return_type)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/graphql/execution/execute.py", line 666, in handle_field_error
raise error
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/graphql/execution/execute.py", line 625, in await_result
return_type, field_nodes, info, path, await result
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/strawberry/extensions/directives.py", line 19, in resolve
result = await await_maybe(_next(root, info, *args, **kwargs))
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/strawberry/utils/await_maybe.py", line 12, in await_maybe
return await value
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/strawberry/schema/schema_converter.py", line 396, in _async_resolver
return await await_maybe(_get_result(_source, strawberry_info, **kwargs))
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/strawberry/utils/await_maybe.py", line 12, in await_maybe
return await value
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fiftyone/server/paginator.py", line 108, in paginate
return await get_items(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fiftyone/server/paginator.py", line 70, in get_items
data = await foo.aggregate(collection, pipelines)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fiftyone/core/odm/database.py", line 322, in _do_async_pooled_aggregate
return await asyncio.gather(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fiftyone/core/odm/database.py", line 328, in _do_async_aggregate
return [i async for i in collection.aggregate(pipeline, allowDiskUse=True)]
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fiftyone/core/odm/database.py", line 328, in <listcomp>
return [i async for i in collection.aggregate(pipeline, allowDiskUse=True)]
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/motor/core.py", line 1158, in next
if self.alive and (self._buffer_size() or await self._get_more()):
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/motor/core.py", line 1626, in _on_started
pymongo_cursor = future.result()
File "/anaconda/envs/azureml_py38/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/collection.py", line 2502, in aggregate
return self._aggregate(_CollectionAggregationCommand,
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/collection.py", line 2419, in _aggregate
return self.__database.client._retryable_read(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1525, in _retryable_read
return func(session, server, sock_info, secondary_ok)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/aggregation.py", line 137, in get_cursor
result = sock_info.command(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/pool.py", line 710, in command
return command(self, dbname, spec, secondary_ok,
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/network.py", line 161, in command
helpers._check_command_response(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/pymongo/helpers.py", line 167, in _check_command_response
raise OperationFailure(errmsg, code, response, max_wire_version)

Willingness to contribute

The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

brimoor commented 1 year ago

Hi @janbernloehr 👋

The FiftyOne API makes use of quite a few MongoDB v4.4 features, and I believe Cosmos DB is only currently compatible up to v4.2, so unfortunately its not possible to connect FiftyOne to Cosmos DB at the moment.

It looks like the error you're showing is coming from trying to launch the App, which would make sense because some of the aggregations required to load the App do use MongoDB v4.4 features.

pietergobin commented 2 months ago

👍 Hi! Just wondering, since Azure has upgraded their MongoDB supported versions, is it possible to point fiftyone to Cosmos today? Is another table scheme (such as postgress) another option?

Basically I want to keep things in Azure if it is at all possible.