marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.66k stars 192 forks source link

Support OpenAPI spec generation [ENHANCEMENT] #778

Open pandu-k opened 9 months ago

pandu-k commented 9 months ago

Is your feature request related to a problem? Please describe. In Marqo 2.x there isn't a way to generate OpenAPI spec. This was possible from the /openapi.json path.

Describe the solution you'd like An endpoint, usable from a running Marqo container, from which to retrieve the openapi specs.

Additional context Relevant docs: https://fastapi.tiangolo.com/reference/openapi/docs/

gabauer commented 2 months ago

Hi @pandu-k,

What's the current status of this issue? It seems like the OpenAPI spec generation is still not functional, as I'm unable to retrieve it from the expected /openapi.json endpoint.

Additionally, the documentation in the README might need an update, as it implies that generating the OpenAPI spec is still supported, which doesn't seem to be the case.

Could you provide any updates or guidance on how to proceed? Thanks!

gabauer commented 2 months ago

I’ve found a solution that restores OpenAPI spec generation and resolves the issue with the /openapi.json and /docs paths. The root cause was in src/marqo/tensor_search/models/add_docs_objects.py, where the documents field included a union type. The issue arose because the second type of the union, np.ndarray, could not be serialized properly.

From my perspective, there are a few potential ways to address this:

  1. Implement a custom OpenAPI function for FastAPI to exclude ndarray fields from the schema.
  2. Create a custom JSON encoder for models containing ndarray.
  3. Replace np.ndarray with a standard Python list type.

I opted for the third approach since it was the simplest and most straightforward solution.

Here are the specific changes I made:

After applying these modifications, the OpenAPI spec at /openapi.json and Swagger UI at /docs are now functioning as expected.

I'm not entirely sure if this is the ideal long-term solution, but if it seems like a good fit, I’d be happy to submit a PR for it.

pandu-k commented 2 months ago

Thanks @gabauer - we'll look into this

papa99do commented 2 months ago

Hello @gabauer , thanks for looking into this and raising a PR. I checked the source code and noticed that the ndarray type here is not used in any code or tests. I think it would be safe to remove it. Actually, the only type accepted for docs is List[Dict[str, Any]]

gabauer commented 2 months ago

Hi @papa99do, thanks for looking into it, I will check this out, try to let the tests run and add a test with a simple "is there" check on the openapi URIs to prevent breaking this stuff in the future.

papa99do commented 3 weeks ago

Hello @gabauer , sorry for the long wait. I've merged in your change after addressing the build permission issue. I've tested that the openAPI endpoints can be accessed again. I'll follow up to add more documentation and examples in following PRs. Thanks again for your contribution!