jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
20.56k stars 2.21k forks source link

fix: use body for streaming instead of params #6098

Closed NarekA closed 8 months ago

NarekA commented 8 months ago

Making a new PR that follows conventions to replace https://github.com/jina-ai/jina/pull/6091 and https://github.com/jina-ai/jina/pull/6093

Http streaming breaks when the input doc schema has fields which are not str, int, or float. This includes dicts, bools, and nested objects (See Issue 6090). In addition it caps the input size to 2000 characters (Much less when you factor in URL encoding)

This PR changes the get endpoints to have the data passed in the request body rather than the params.

Goals:

Also related: https://github.com/jina-ai/jina/pull/6097

codecov[bot] commented 8 months ago

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (f2085a9) 75.98% compared to head (bf3cbe2) 76.89%. Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6098 +/- ## ========================================== + Coverage 75.98% 76.89% +0.91% ========================================== Files 143 145 +2 Lines 13974 14014 +40 ========================================== + Hits 10618 10776 +158 + Misses 3356 3238 -118 ``` | [Flag](https://app.codecov.io/gh/jina-ai/jina/pull/6098/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | Coverage Δ | | |---|---|---| | [jina](https://app.codecov.io/gh/jina-ai/jina/pull/6098/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | `76.89% <76.00%> (+0.91%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai) | Coverage Δ | | |---|---|---| | [jina/\_\_init\_\_.py](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9fX2luaXRfXy5weQ==) | `56.00% <100.00%> (ø)` | | | [jina/clients/base/helper.py](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9jbGllbnRzL2Jhc2UvaGVscGVyLnB5) | `85.38% <100.00%> (+2.43%)` | :arrow_up: | | [jina/serve/runtimes/gateway/request\_handling.py](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9zZXJ2ZS9ydW50aW1lcy9nYXRld2F5L3JlcXVlc3RfaGFuZGxpbmcucHk=) | `86.75% <100.00%> (+0.64%)` | :arrow_up: | | [...ve/runtimes/gateway/http\_fastapi\_app\_docarrayv2.py](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9zZXJ2ZS9ydW50aW1lcy9nYXRld2F5L2h0dHBfZmFzdGFwaV9hcHBfZG9jYXJyYXl2Mi5weQ==) | `0.00% <0.00%> (ø)` | | | [jina/serve/runtimes/worker/http\_fastapi\_app.py](https://app.codecov.io/gh/jina-ai/jina/pull/6098?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai#diff-amluYS9zZXJ2ZS9ydW50aW1lcy93b3JrZXIvaHR0cF9mYXN0YXBpX2FwcC5weQ==) | `80.86% <72.72%> (+11.44%)` | :arrow_up: | ... and [9 files with indirect coverage changes](https://app.codecov.io/gh/jina-ai/jina/pull/6098/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=jina-ai)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

JoanFM commented 8 months ago

I am not sure if I would prefer to keep the 2 endpoints separate and fix the underlying issue. @alaeddine-13 is the integration with SSE as in documentation possible still?

NarekA commented 8 months ago

@JoanFM I'm just taking the existing endpoint and allowing it to accept either parameters or a request body. I'm happy to add a post endpoint as well, but I need your help with the gateway.

NarekA commented 8 months ago

Here is a branch on my fork that will reproduce the gateway error

alaeddine-13 commented 8 months ago

@NarekA why do you want to change the GET endpoint method ? I thought the agreement was that, if you have basic requirements and would like to use client libraries, you use GET endpoint and if you need customized schemas and long input, you use the POST method. What is wrong with this approach ?

alaeddine-13 commented 8 months ago

@NarekA why do you want to change the GET endpoint method ? I thought the agreement was that, if you have basic requirements and would like to use client libraries, you use GET endpoint and if you need customized schemas and long input, you use the POST method. What is wrong with this approach ?

Actually now I realize you want to allow passing either of body or query parameters. I'm not sure if that's a good practice when building HTTP services but I think it's a nice idea actually. Yes @JoanFM this should keep compatibility with the documentation section where we use a standard js client. A test would be great to have though or at least we need to try out the documentation on this PR before merging

NarekA commented 8 months ago

@alaeddine-13 The current implementation with the POST method is fine, but it seems to break when using a deployment with a gateway. The stream all comes out at once. I modified this test (and am changing it back) not realizing that I was fixing the very thing it was testing for. The parameterized test passes for all the scenarios except for http+gateway. Using get instead of post seems to fix the gateway and we have only one endpoint so it's less confusing.

NarekA commented 8 months ago

If we do allow post, I like the idea of just using the same route with 2 methods: methods=['GET', 'POST'],. That way we have consistency between the endpoints, and don't have to maintain 2 routes.

NarekA commented 8 months ago

Only one test job failing, and I can't find a clear stack trace in it so it might be flakiness.