Future of wait_for_task

bidoubiwa commented 1 year ago

Currently in all of our sdk's we have a method named wait_for_task.

What does this method do

wait_for_task provides the possibility to block the current flow of your code until a task is either processed or failed. It has a default timeout time of 5 seconds in most SDK's. The value of the timeout can be changed in the function parameters (same as the interval time).

What is the issue

In a development environment where we mostly work on smaller datasets, tasks are fast, and this method seems to be just fine and do what we expect it to do. Nonetheless, in a production environment datasets are bigger, tasks are longer and the wait_for_task method could wait way longer for a task to be completed, longer than 5 seconds, throwing an timeout error.

This timeout behavior is not intuitive based on this method's name. Most users would expect the method to wait indefinitely for the task to be resolved. We decided to keep a timeout to protect the users from blocking their whole production flow of an infinite loop.

Nonetheless, it does not fix the issue that the user does not expect a timeout error to be thrown and that he might not realize it before he works in an environment with bigger tasks. Additionaly, the error thrown is TimeOutError or MeilisearchTimeOutError, which implies an issue on the Meilisearch server side and not on the tasks duration. This makes debugging way harder.

Solution

Again, to protect the user, we decided not to remove the default timeout. We also do not want to change the name to something more complex. What we want is to inform and warn the user as much as possible and make debugging and understanding the error easier.

Implementation of solution

Documentation

For a user to known this method exists, they need to find it somewhere. Wherever they find out about this method they should be met with a good explanation and a warning.

These are the places where these information should be added:

[ ] In the code, above the function name following the language documentation convention.
[ ] In the documentation (docs.meilisearch.com). @brunoocasali is talking about that with @dichotommy (https://github.com/meilisearch/documentation/issues/2171)
[ ] In the readme if the method is mentioned there (for example in the JS SDK)
[ ] In the tutorials that might use this method.
[ ] Ensure that this information appears in the intelliSense of the language when using the SDK. For the information, what we suggest is something along the line of:

`wait_for_task` blocks the thread in which it is used until the task is settled (that is, processed or failed). The method has a default timeout of 5 seconds which can be configured in the parameters.

=> code example.

⚠️ We do not suggest using it in production. Tasks might take longer to be settled, which would result in timeout errors or infinite loops deferring your code depending on its settlement.
What we suggest is that if you really need this method in prod, you should run it in its own thread that is not the main thread to avoid blocking it.

Error naming

The name of the error should also be updated to something more relevant: timeOutError or MeilisearchTimeOutError should be renamed. Something along the lines ofWaitForTaskTimeOut. if you have a better suggestion feel free to mention it.

Additional questions

technical terms Im not sure about the technical terms used to explain the blocking. @brunoocasali do you have any suggestion?

blocks the current flow of your code

pauses the execution of your code until the task is settled

blocks the thread in which it is used

defers execution of code that actually depends on the result

Additional information

Should we be way more clear on what a task is so that a user do not wonder why after adding documents, they are not immediately present in their index?
Should we also suggest using this method on other threads than the main one?

Competitors

In elasticsearch a wait_for parameter their refresh route that will wait for documents to be accessible in the search before responding. Based on this StackOverflow question they do not advise to use it in production. But I couldn't find this mention on elasticsearch documentation.

Conclusion

What do you think ?

alallema commented 1 year ago

Nonetheless, in a production environment datasets are bigger

I still think that we should strongly advise against this method in production.

that the user does not expect a timeout error to be thrown

I so agree with this.

In the readme if the method is mentioned there (for example in the JS SDK)

I am not sure that giving access to this function in the README so easily is a good idea. Again I'm all for easy access to Meilisearch and easy onboarding but I'm afraid that some users won't go further and won't take the time to understand the function of the asynchronous API before using it in production and will get stuck. I think we should be careful on this point.

bidoubiwa commented 1 year ago

I am not sure that giving access to this function in the README so easily is a good idea.

I'm not sure I like the idea that the API references in the readme are not exhaustive. We should own to our decision of keeping the method and following the moto: Docs should always be complete, but never finished.

Again I'm all for easy access to Meilisearch and easy onboarding but I'm afraid that some users won't go further and won't take the time to understand the function of the asynchronous

Should we add in the warning that tasks are asynchronous and that they are not instantly settled?

brunoocasali commented 1 year ago

For me @bidoubiwa it could be any of these:

pauses the execution of your code until the task is settled or blocks the thread in which it is used.

Also, WaitForTaskTimeOut is a good name for me.

Should we be way more clear on what a task is so that a user do not wonder why after adding documents, they are not immediately present in their index?

I don't think it's necessary because the docs already explain that in the docs.meilisearch.com, so I tend to think the users will not need it twice + more text to maintain.

I've discussed it with @dichotommy and I'll create an issue about the subject on their side. So we will have better documentation of this method directly on the docs website. I've also added a new step in the issue @bidoubiwa which it will require the review of @maryamsulemani97 in the first PR of the SDKs, (then we can replicate in the others).

sanders41 commented 1 year ago

pauses the execution of your code until the task is settled or blocks the thread in which it is used.

Just a note that this will be SDK specific. The Rust SDK for example was setup specifically to not block the thread and does an async sleep.

alallema commented 1 year ago

Should we add in the warning that tasks are asynchronous and that they are not instantly settled?

For me, we should not give access to this method in the README however it will be nice to add a sentence after the getting started to underline the fact that the documents are not added instantaneously but that the API is asynchronous

Docs should always be complete, but never finished.

❤️ ❤️ ❤️

curquiza commented 1 year ago

For me, we should not give access to this method in the README however it will be nice to add a sentence after the getting started to underline the fact that the documents are not added instantaneously but that the API is asynchronous

I just pop to give you some legacy context. The sentence already exists but does not mention the asynchronous word.

We discussed this in the past: originally the "asynchronous" word was present (and the API link as well), and then we decided not to mention "asynchronous" (like Algolia, who always avoid using it). We indeed found out some users were "afraid" of the asynchronous word (bringing complexity for them, so some friction when starting with meilisearch). Also, asynchronous does not have the exact same meaning with some languages like JS

However, this is something you can reconsider (the sentence can be improved without the async word for instance). I just bring more context to avoid you keeping stuck in an infinite loop of debate (especially for @bidoubiwa who is here since the beginning of the debate 😂)

gmourier commented 1 year ago

Thank you @bidoubiwa. Sharing a discussion where a user asks for a way to have a synchronous call.

Also, adding that Algolia has dedicated doc pages for the subject and explains when it could be useful or not given the user's need, if it helps.

Listing crazy ideas you may want to consider, if people struggle to use waitForTask after some iterations and you decide to keep it:

Introduce a client instantiation option (asynchronous / synchronous) mode. So every write operations are bound to a default without the need to call waitForTask
Having sync methods for every current async variant. e.g.add_documents and add_documents_sync

bidoubiwa commented 1 year ago

Thanks a lot for your input @curquiza and @gmourier. It is very valuable.

@curquiza Avoiding using the word asynchronous is indeed a good idea. Or if we do, bring enough context for the user to understand it.

Most document and index-related actions return tasks. These tasks are not processed immediately. They are added to a queue of tasks and will, in turn, be completed.

@gmourier Currently, we are opting for better documentation on the method as a quick "fix". Nonetheless, considering other possibilities for a later future is always very interesting.

gmourier commented 1 year ago

I think this is a great idea to opt for better documentation since:

Are there problems? 1.1 If so, what are they? (Is it important to have this possibility to make synchronous calls? Do users need it? Do they don't understand how to use it?) 1.2 Is this a priority?

Knowing that, a simple iteration that better documents the usage of waitForTask allow us to avoid making drastic changes without more knowledge and permits us to iterate safely without over-committing to a "costly" solution.

Do we have a way to track feedback on this? I can create a tracking table, so we can feed it over time and see how it goes.

alallema commented 1 year ago

I allow myself to add this reflection from @sanders41 here. I know this is not exactly the topic of this conversation, but in my opinion, it somewhat joins this issue but I could open another topic of discussion elsewhere if that's easier. For information, a user would like the possibility to raise an exception for a failed task to reduce the amount of boilerplate code needed. The following solution has therefore been proposed, namely to add raise_for_status as a parameter to the method like:

def wait_for_task(
        self,
        uid: int,
        timeout_in_ms: int = 5000,
        interval_in_ms: int = 50,
        raise_for_status: bool = False,
    )

Point to note:

Yes, it's not complicated for the user to raise his own exception but this can indeed result in a lot of code to write or copy right away specifically on strong type language.
This could reduce some points of friction, indeed the question that a failed task does not return an error.
There is of course the risk that this continues to complicate the use of this already complex method but I think this point deserves to be discussed.

ahmednfwela commented 1 year ago

The main reason we want wait for task, is because we need to make sure the API consumer receives the latest valid data. The user can't send a request to delete an item and then still see it after refreshing, it breaks user experience, and sometimes leads to the user doing multiple deletes/updates/insertions thinking that they are not going through.

I suggest the following:

All write operations that produce tasks should get an extra boolean query parameter (waitForTask=false by default), which if true will do 2 things:

give higher priority to that task, making sure it completes as fast as possible.
won't complete the request until the task is finished.

brunoocasali commented 1 year ago

I've pushed your idea to the product side so this way we can track it and do something about it in the future. If you think this will be important for the meilisearch-flutter @ahmednfwela let's make the waitForTask a public method :)

brunoocasali commented 1 year ago

First, thanks to everyone for your input here. I'm cleaning the repo and will give the final input on this topic to close this issue.

The Dart SDK is the only SDK that does not expose a version of waitFor in it, so it is up to the current maintainer @ahmednfwela to decide if it is worth exposing it. I'll not be against it anymore.

All the remaining SDKs will stay as they are, and I'm going to submit this discussion as part of the @meilisearch/product-team prioritization. If this is a real user need, we should find a proper solution to help them. If the solution is to keep/improve these methods (waitForTask) then we can invest time in doing some improvements in the SDKs under the Meilisearch release cycle umbrella.

Thanks, everyone!

meilisearch / integration-guides