meilisearch / integration-guides

Central reference for Meilisearch integrations.
https://meilisearch.com
MIT License
134 stars 15 forks source link

Changes regarding Meilisearch v1.2.0 #261

Closed brunoocasali closed 1 year ago

brunoocasali commented 1 year ago

This issue gathers the changes related to the v1.1.0 of Meilisearch that will impact the integrations team.

πŸ“… Release date: June 5th, 2023

The whole milestone of v1.2.0 is here!

⚠️ The SDKs, which should be done by the release date, are only tier #1. Some of the features described here for tier #2 will also be ready for the release day. Check the features below to understand which tier will receive which feature.

πŸ–– Click here to check the current tiers state of the integrations:
Understand everything about tiers here.

Integration | Tier | -------------|------| | Javascript | #1 | | PHP | #1 | | Instant Meilisearch | #1 | | Python | #1 | | Ruby | #1 | | Go | #1 | | Strapi | #2 | | .NET | #2 | | Rails | #2 | | Rust | #2 | | Symfony | #3 | | Java | #3 | | Firebase | #3 | | docs-searchbar.js | #3 | | Dart | #3 | | Swift | #3 | | Vuepress | #3 | | Gatsby | #3 |

⚠️ If no change in the public API is added during the bump pull requests, there is no need for a new release, according to the Integrations team's versioning policy guideline.


tiers - #1 Delete documents by filter

Related to:

It allows the user to use a filter expression to remove documents.

Since there is already a method called deleteDocuments(uids)/delete_documents(uids) in the SDKs, and this method is calling the route POST /indexes/:index_uid/documents/delete-batch, the requirements to implement the new way is to create a conditional internally to check the presence of the new parameter filters and then call the new method POST /indexes/:index_uid/documents/delete. When the developer uses the old ids argument, it should keep calling the old implementation.

This will avoid any silent failures the users may find.

For example, in the Ruby SDK:

# Public: Delete documents from an index
#
# documents_ids - An array with document ids (deprecated, optional)
# filter - A hash containing a filter that should match documents. Available ONLY with Meilisearch v1.2 and newer (optional)
# 
# Returns a Task object.
def delete_documents(documents_ids = nil, filter: nil)
  if documents_ids
    http_post "/indexes/#{@uid}/documents/delete-batch", documents_ids
  else
    http_post "/indexes/#{@uid}/documents/delete", { filter: filter }
  end
end

Extra: Add inline documentation for the method, explaining the availability of the filter syntax only for Meilisearch v1.2 and newer.

Extra: Mark the document_ids parameter as deprecated. eg. @Deprecated('migration') on Java/Dart, Obsolete on C#, etc. Add a library if necessary.

Extra: Add a try/catch to detect the error and give the user a hint about the possibility of a version mismatch between the SDK version and the instance version. Check this PR for an example in JavaScript: https://github.com/meilisearch/meilisearch-js/pull/1492

Extra: General recommendations:

🀩 Affected integrations: All the integrations from tier #1.


tiers - #1 Retrieve documents by filter

Related to:

Gives the user the possibility to use a filter expression to retrieve documents, like in search.

Following the same concept in the previous section, deleting by filters, this Meilisearch version introduces a new way of finding documents by introducing a new route and arguments.

Implement in the getDocuments/get_documents method an internal conditional allowing the user to query the documents with the same method but using a new filter method.

When the user calls the get documents method with a filter argument, request POST /indexes/{index_uid}/documents/fetch using a JSON body containing an int offset, int limit, String[] fields and now String filter or String[] filter. ⚠️ If the method invocation does not contain a filter it should still call the previous implementation.

🀩 Affected integrations: All the integrations from tier #1.


IS EMPTY filter operator

Related to:

⚠️ Dart is the only SDK we maintain which have a DSL when it comes to declaring the filters. So, no need to implement anything in any of the other SDKs.

Add the IS EMPTY new operator to the filterExpression DSL on the Dart SDK.

🀩 Affected integrations: Only meilisearch-dart.


IS NULL filter operator

Related to:

⚠️ Dart is the only SDK we maintain which have a DSL when it comes to declaring the filters. So, no need to implement anything in any of the other SDKs.

Add the IS NULL new operator to the filterExpression DSL on the Dart SDK.

🀩 Affected integrations: Only meilisearch-dart.

ahmednfwela commented 1 year ago

since fetchDocuments is POST, does it make sense to also name its body FetchDocumentsQuery ? why not FetchDocumentsBody ?

sanders41 commented 1 year ago

For delete by filter:

When the language support named parameters, move ids to a parameter and add filter alongside it.

This means a user could send both ids and filters in the same request. Based on the spec I think delete by filter has to be a new separate method.

* Introduce a new POST /indexes/:index_uid/documents/delete route that lets you delete documents by filter
* The payload must be an object only containing a filter field, i.e.: { "filter": "doggo = bernese" }
* A new error has been introduced in case the specified filter is wrong or empty: invalid_document_delete_filter

The current delete_document is a delete request with the UID in the URL. Delete by filter is a post route with only the filter in the post body. So I think mixing the two is going to cause issues. Does this make sense or am I missing something?

sanders41 commented 1 year ago

I have this ready here if anyone wants ideas for a starting point.

In working on it there is one thing I think is worth mentioning that may catch end users (because it got me for a minute :smiley:). If you forget to add the fields to the filterable attributes and use them for deletes/gets you don't get an error. It acts as everything worked as expected and gives successful responses, it just doesn't do anything.

Another question I have is why add the fetchDocuments method? getDocuments adds a filter parameter so the only difference between fetchDocuments and getDocuments is one uses the post route while the other uses the get route. But the user of the client never sees this so they would end up with two different methods that do the exact same thing.

bidoubiwa commented 1 year ago

What about using deleteDocuments with POST /documents/delete route and stop using POST /delete-batch? In the same vain, using getDocuments with the POST /documents/fetch route and stop using GET documents/ like @sanders41 suggests?

Both previous routes seem to be just a lesser performant version of the newly introduced routes.

sanders41 commented 1 year ago

This is doable without a breaking change since the end user doesn't see this part. Have you measured performance between the options to see the new ones are more performant? If not I may run some tests to see what the difference would be.

bidoubiwa commented 1 year ago

To answer @sanders41, @meilisearch/engine-team can you tell us if the POST /documents/fetch route more performant than GET /documents/?

This is doable without a breaking change since the end user doesn't see this part.

While we can avoid a breaking change on the getDocuments I don't think we can avoid one with deleteDocuments as most of the function parameters are deleteDocuments(ids: str[]). Like @brunoocasali said, overloads might avoid that in some languages but it might be worth making that breaking change now on the others (and maybe even on the one accepting overloads). I think /delete-batch might be deprecated in v2, can you confirm @gmourier ?

gmourier commented 1 year ago

I think /delete-batch might be deprecated in v2, can you confirm @gmourier ?

Yes, it will probably be marked as deprecated for v1.3 to be removed for v2.0. It could also be the case for the GET /indexes/:indexUid/documents since we are introducing POST /indexes/:indexUid/documents/fetch with v1.2

cc @macraig

irevoire commented 1 year ago

Hey @sanders41, it could be interesting to see the difference. tbh, I don't expect it to have any difference. Both go through the same pipeline; thus, the only difference is the query parameter parser vs the json parser, which should take a few Β΅s.

On the other side, though, an URL is limited to 2048 characters, which means a large filter won't fit in the GET route (and a few users already told us that they wanted this route to query thousands of IDs). Also, I don't know how this impacts you (=the integration), but the GET route only supports the string syntax of the filter, while the POST route support all the strange syntax we introduced (with the array of array of string thingy)

sanders41 commented 1 year ago

I setup a test and the get route is slightly more performant than the post route, but not by enough be a deciding factor if there are other benefits to the post route. I did a few different runs with 1000 runs of each version and they were always within a second of each other. I made a branch here if anyone wants to play with it. To run the test run python performance_test.py (after installing the dependencies)

brunoocasali commented 1 year ago

@ahmednfwela

since fetchDocuments is POST, does it make sense to also name its body FetchDocumentsQuery ? why not FetchDocumentsBody ?

It is to keep the pattern between all the SDKs :) And even query meaning query-string in the end, it is more straightforward for the user to have a type more aligned with the concept of being a query/param or something like this.

brunoocasali commented 1 year ago

What about using deleteDocuments with POST /documents/delete route and stop using POST /delete-batch? In the same vain, using getDocuments with the POST /documents/fetch route and stop using GET documents/ like

This is doable without a breaking change since the end user doesn't see this part.

This IS a breaking change from my POV, primarily because of the issue I've raised here https://github.com/meilisearch/integration-guides/issues/264.

Users will upgrade their SDKs because it is way easier to run an npm install than update the Meilisearch instance. And I expect more users complaining in our repositories that, miraculously, my SDK is not working after the last update.

Because the developers usually update their SDKs and don't do the same with Meilisearch, they intentionally use the new features now available through the new versions and only realize the mistake in production.

In this particular case, if we accept internally on deleteDocuments a new route: POST /documents/delete, the users who silently update their SDKs will start having exceptions without changing their code because they may not have the new Meilisearch runtime, that's not acceptable since we also have stable SDKs like PHP.

I also don't like the idea of having a deleteDocumentsByFilter, but by far is the safest option right now to avoid any issues, as I mentioned above, in the near future. The same will happen with the getDocuments.

⚠️ By the way, my description of this issue was not great, sorry community about that.

sanders41 commented 1 year ago

This IS a breaking change from my POV, primarily because of the issue I've raised here https://github.com/meilisearch/integration-guides/issues/264.

Fair enough, I see your point here. It's not breaking strictly from an API standpoint, but you are right it is breaking because of the interaction with Meilisearch.

brunoocasali commented 1 year ago

To all watching this thread, I've updated the "Delete documents by filter" section. We should keep maintaining the current deleteDocuments.

It will avoid a silent update + break for the current users. Also, I don't think the "Get documents by filter aka fetch" needs any change based on the current feedback.

bidoubiwa commented 1 year ago

About deleteDocuments, I understand your point of view. We did tell the user that the SDK's would be retro-compatible between versions, which would not be the case in SDK's without the overload possibility.

About fetchDocuments and getDocuments, following my suggestion on #264, we could catch the 404 of the new POST /documents/fetch and in which case use GET /documents.

fetchDocuments seems very counter intuitive in what it does. I'm not able to know by the naming what the difference is with getDocuments. If we do go for two different methods, maybe we should be consistent with the new delete route and name it getDocumentsByFilter.

Though maybe, like you said bruno, having the deprecation warning telling the user to update Meilisearch and use fetchDocuments might be a better solution. I'm not sure. Should we use the same deprecation warning on the delete route?

sanders41 commented 1 year ago

I agree with @bidoubiwa on the confusion withfetchDocuments because as far as the end user can see it should do the same thing as getDocuments. On my side I decided to only use getDocuments that uses the get route and added a note to the filter parameters that it is only available to Meilisearch 1.2.0. It's not a perfect solution, but I think there's less chance for questions going this way. I guess we'll find out if I'm wrong πŸ˜€

ahmednfwela commented 1 year ago

how about the sdk clients merge both methods into one,but select which api to call based on the passed parameter ?

for example if filter is passed ,call the POST method if not, use the GET method

bidoubiwa commented 1 year ago

@ahmednfwela, unfortunately this does not work on most typed language :( But for the other it could be a great option

sanders41 commented 1 year ago

@ahmednfwela I gave this some thought and because of @irevoire comment that post could be more flexible I'm going to do this.

@bidoubiwa why wouldn't it work with typed languages? For example in rust if filter is an Option<&str> then you can do one thing if it is Some and another if it is None.

bidoubiwa commented 1 year ago

I thought they were talking about the deleteByFilters route, my bad.

getDocuments accepts the same parameters as fetchDocuments (see spec). Thus, I don't think we're able to know which route to use based on the parameters.

EDIT: except for the filter field. Indeed we could call the GET route if filters is not in the parameters

sanders41 commented 1 year ago

I think the idea is if the filter parameter is not None the method would use the post route and if not use the get route. So with that the method can decide which route to use. The post route is going to fail for Meilisearch < 1.2.0, but the same is true for the get route if a filter is included.

bidoubiwa commented 1 year ago

You're right. I like the solution :)

sanders41 commented 1 year ago

I have my PR updated to implement this if anyone wants to see what it ends up looking like.

bidoubiwa commented 1 year ago

Hey @sanders41

I'll let @brunoocasali confirm what I said. We added some additional measure to protect the user in the getDocuments method, you can see an example here: https://github.com/meilisearch/meilisearch-js/pull/1493

It's basically what you did but we added a hint the error message in case of a failure in the HTTP request (example here)

MeiliSearchCommunicationError: Method Not Allowed Hint: It might not be working because you're not up to date with the Meilisearch version that getDocuments call requires.

The hint message may be improved of course.

We alse added in the comments section that would appear in the IntelliSense hover tool that the feature only works above v1.2 (see example here).

and for deleteDocuments, if possible, we would like to use that method and not have to create a new one. This is possible in some languages that are not typed, that are permissive (typescript), and that accept overloading, see example here.

What do you think?

sanders41 commented 1 year ago

This all sounds good to me, but I have one question. When I try sending a bad parameter I get a MeilisearchApiError, but you are catching a MeiliSearchCommunicationError. Do we need to catch both?

bidoubiwa commented 1 year ago

The bad parameter error should not provide the hint as it means that the route exists but that the user wrongly formatted their parameter.

In this case, the error returned by the MeilisearchAPIError should be enough for them to pinpoint the error.

wdyt?

sanders41 commented 1 year ago

It turns out I also get a MeilisearchApiError when the route doesn't exist. I'm about to push the latest updates to the Python delete and I can show you what I mean there.

bidoubiwa commented 1 year ago

Oh my bad, It may be a bug on python's SDK side.

bidoubiwa commented 1 year ago

For the languages that lists the error_codes. There are new ones

brunoocasali commented 1 year ago

This pre-release is finished. All the remaining work was moved to our own repositories, closing this!

Thanks team you're the best! πŸŽ‰ Thanks to our meilistars @sanders41 and @ahmednfwela for dealing with this pre-release in Python & Dart for us, it was incredible helpful, you folks rock! πŸ’ͺ πŸŽ‰