meilisearch / meilisearch-rails

Meilisearch integration for Ruby on Rails
https://www.meilisearch.com
MIT License
295 stars 48 forks source link

Error: MeiliSearch::TimeoutError: The request was not processed in the expected time. #355

Open drale2k opened 4 months ago

drale2k commented 4 months ago

Description Meilisearch Cloud is throwing random Timeout Errors, causing my record not to be indexed and the failing jobs to be discarded.

Error: MeiliSearch::TimeoutError: The request was not processed in the expected time. Net::ReadTimeout with #<TCPSocket:(closed)>

How is this case supposed to be handled with a Meilisearch Cloud installation? Adding a retry to the Job would definitely help

retry_on MeiliSearch::TimeoutError, wait: 30.seconds, attempts: :unlimited

Ideally wait time and attempts would be configurable in an initializer. Is there a way to do this without a change to the gem?

Expected behavior Timeout Jobs should be automatically retried. Or any other solutions how to handle this?

Current behavior What happened.

Stacktrace

"/Users/drale2k/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:113:in `rescue in send_request'",
"/Users/drale2k/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:108:in `send_request'",
"/Users/drale2k/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:26:in `http_get'",
"/Users/drale2k/.rbenv/versions/3.3.0/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/index.rb:286:in `settings'",

Environment (please complete the following information):

ellnix commented 4 months ago

This should not be difficult to implement but I can predict a data race occurring:

Say I have document "constitution" with field "first_line":

Since retry_on seems to be a class method on ActiveJob::Base you should be able to configure it for yourself:

# config/initializers/meilisearch.rb
MeiliSearch::Rails::MSJob.retry_on MeiliSearch::TimeoutError, wait: 30.seconds, attempts: :unlimited
drale2k commented 4 months ago

That's a very valid issue. Is there any guidance what to do if Meilisearch Cloud is not available or returning timeouts? It would be really helpful to have a retry mechanism built in which discards updates that are out of date (as in your example). Otherwise it's basically retry and pray

ellnix commented 4 months ago

I did discuss it with Bruno in the past and he agreed that it would be a good feature to be able to opt into: https://github.com/meilisearch/meilisearch-rails/issues/187#issuecomment-1753138744

However I prioritized refactoring the codebase before adding new features and unfortunately haven't had time to finish it.

drale2k commented 4 months ago

I keep running into timeout issues constantly. This time when trying to reindex a model with 7 records

Is there anything i can do about this? This is really breaking for me


Episode.reindex!
(irb):1:in `<main>': The request was not processed in the expected time. Failed to open TCP connection to ms-d***.fra.meilisearch.io:443 (execution expired) (MeiliSearch::TimeoutError)
/Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1603:in `initialize': Failed to open TCP connection to ms-d***.fra.meilisearch.io:443 (execution expired) (Net::OpenTimeout)
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1603:in `open'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1603:in `block in connect'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/timeout.rb:186:in `block in timeout'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/timeout.rb:193:in `timeout'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1601:in `connect'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1580:in `do_start'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:1569:in `start'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/http.rb:2297:in `request'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/httparty-0.21.0/lib/httparty/request.rb:156:in `perform'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/httparty-0.21.0/lib/httparty.rb:612:in `perform_request'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/httparty-0.21.0/lib/httparty.rb:526:in `get'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:27:in `block in http_get'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:109:in `send_request'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/http_request.rb:26:in `http_get'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/gems/meilisearch-0.26.0/lib/meilisearch/index.rb:286:in `settings'
    from /Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/gems/3.3.0/bundler/gems/meilisearch-rails-52526a735c73/lib/meilisearch-rails.rb:308:in `block in settings'

It worked when running it a second time

brunoocasali commented 4 months ago

Hey @drale2k, can you run this command ulimit -a and send me the results?

brunoocasali commented 4 months ago

Also, can you try setting a different value on the timeout:

MeiliSearch::Rails.configuration = {
  meilisearch_url: '...',
  meilisearch_api_key: '...',
  timeout: 5, # default is: 1
  max_retries: 3 # default is: 0
}
drale2k commented 4 months ago

I am on a Macbook Pro M1

ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       10666
-n: file descriptors                256

I used the recommended setting you posted above for 2 days now and just now as i was writing this comment, i executed a couple of reindexes on my model which has about 7 records.

The first 4 ran fine, the 5th timed out again

 Episode.reindex!
(irb):10:in `<main>': The request was not processed in the expected time. Net::ReadTimeout with #<TCPSocket:(closed)> (MeiliSearch::TimeoutError)
/Users/drale2k/.rbenv/versions/3.3.1/lib/ruby/3.3.0/net/protocol.rb:229:in `rbuf_fill': Net::ReadTimeout with #<TCPSocket:(closed)> (Net::ReadTimeout)

Trying to replicate i ran 20 reindexes twice. The last request (the 40th) timed out again

irb(main):062* 20.times do
irb(main):063*   Episode.reindex!
irb(main):064> end
brunoocasali commented 4 months ago

Thanks for the ulimit info, and I think I have an idea why I can't reproduce your case:

# this is my Macbook Pro M1:
ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       5333
-n: file descriptors                2560

Each network connection uses a file descriptor (I don't know if that's a Net ruby thing). But when you make multiple requests simultaneously or in quick succession (like in a multi-threaded scenario), you may exhaust the available file descriptors allowed for your process.

Also, I can see you have double of user processes than me, so it could mean that you are probably using this space with other processes that are consuming your file descriptors limit.

So, can you try updating this limit?

ulimit -n 2048 # Sets the limit to 2048 file descriptors

brunoocasali commented 4 months ago

Hi @drale2k did you have the time to do those changes?

drale2k commented 4 months ago

Hi, yes i changed it since then but had to move on to other features and did not test in high enough volume. I am about to deploy to production next week and will report on how it went there. Sorry for the late reply

brunoocasali commented 4 months ago

No problem, @drale2k. Let me know when you have any news! Good luck! 🙏

drale2k commented 4 months ago

I have been running into the issue again today in production on a Debian server (Frankfurt, 2 cores, 4GB RAM) running my rails app. I used the above settings

timeout: 5, # default is: 1
max_retries: 3 # default is: 0

I am convinced this is an issue with the meilisearch ruby library. I noticed that timeout issues have been opened by people before, like https://github.com/meilisearch/meilisearch-rails/issues/120 and https://github.com/meilisearch/meilisearch-rails/issues/260

I don't know if they are related to my issue but especially #120 seems to be basically the same.

I will try the on-prem installation and have it be in the same region as my server to decrease latency. I hope that solves it.

doutatsu commented 3 months ago

I've been having this issue for awhile myself, just as @drale2k described.

image

Even with increased timeout and retries:

MeiliSearch::Rails.configuration = {
  meilisearch_url: ENV.fetch('MEILISEARCH_URL', nil),
  meilisearch_api_key: ENV.fetch('MEILISEARCH_API_KEY', nil),
  timeout: 5,
  max_retries: 2,
  active: !Rails.env.test?
}
drale2k commented 3 months ago

@doutatsu Are you on cloud or self hosted? Have you found a way to deal with it? I am struggling with this as i would really like to stay with Meilisearch but this is becoming a dealbreaker.

I don't think there is a solution right now and it seems to be an issue only with the ruby gem as i cannot see e.g. Laravel people complaining :/

ellnix commented 3 months ago

@brunoocasali Do you have any ideas regarding this or should we work toward using a different request backend?

I could try porting the ms-ruby gem to http.rb (which doesn't use Net::HTTP) if you give me the go-ahead.

brunoocasali commented 3 months ago

So, I was talking to @ellnix in private. We will go for the change and at least try something new. Just don't expect any new releases for now, but it would be amazing if you both could try the beta versions.

We will come back with news soon!

go for it, @ellnix. I'll be ready to review it! 🙏

drale2k commented 3 months ago

Appreciate the effort guys, i am happy to test once you got a PoC. Not that i have a better idea, but is there a reason to believe it is due to Net:HTTP? Being the defacto standard library for Ruby, wouldn't this cause issues all over Ruby / Rails projects?

ellnix commented 3 months ago

Apologies for the delay on this. I made a branch with some dirty edits that use httprb instead of httparty. You should be able to use it by specifying this on your Gemfile:

gem 'meilisearch-rails', git: 'https://github.com/ellnix/meilisearch-rails.git', branch: 'use-httprb-ms-ruby'

Of course, all of the changes are on github for you to review them, but there aren't any. As far as I know there are no regressions in this new httprb version, all tests pass.

P. S.: If this fixes the error, I'll refactor the branch and PR it to both repos as soon as possible.

P. P. S.: httprb currently does not support retries, so the max_retries option is not used. I'll implement a custom retry in a bit.

doutatsu commented 3 months ago

@ellnix Out of curiosity - Is there any reason to use httrb directly instead of using something like Faraday? As it provides adaptors for different middleware, you could test both httrb as well as other alternatives like httpx or Typhoeus. Plus other goodies, like retry logic, so you can make connection to meilisearch more resilient

ellnix commented 3 months ago

Appreciate the effort guys, i am happy to test once you got a PoC. Not that i have a better idea, but is there a reason to believe it is due to Net:HTTP? Being the defacto standard library for Ruby, wouldn't this cause issues all over Ruby / Rails projects?

I also have no idea why this is happening to be honest. Our code making use of Net::HTTP is very basic, I can't see any errors anywhere in it. Since this is a networking problem it's very hard to know exactly what is failing and why. My idea was that by switching gems to httprb there might be large enough differences that coincidentally solve this issue.

@ellnix Out of curiosity - Is there any reason to use httrb directly instead of using something like Faraday? As it provides adaptors for different middleware, you could test both httrb as well as other alternatives like httpx or Typhoeus. Plus other goodies, like retry logic, so you can make connection to meilisearch more resilient

That might be the best thing to do going forward, I was just not aware of Faraday. I'll reach for it if this switch works and I go refacoring, or if I need to switch to another backend. For now I think using httprb directly is good to minimize the number of variables.

doutatsu commented 3 months ago

Yup - Faraday is extensible, you should have everything you need, from retries with backoff to serialisation goodies. Leaving this here, so you can take a look when you get around to refactoring - https://github.com/lostisland/awesome-faraday

ellnix commented 3 months ago

Implemented a (bad) solution for retries, live on that ms-ruby branch. It shouldn't cause any issues except maybe performance ones because I use sleep, but it should not run often (or at all).

drale2k commented 3 months ago

Since this is a networking problem it's very hard to know exactly what is failing and why. My idea was that by switching gems to httprb there might be large enough differences that coincidentally solve this issue.

Trying something is already a good step. Let's see if something else breaks which might help point us towards the actual issue, if not Net::HTTP.

If it does not already, would be good to log when it has to retry so we have some idea if retries are still happening and can try to hunt them down

ellnix commented 3 months ago

If it does not already, would be good to log when it has to retry so we have some idea if retries are still happening and can try to hunt them down

It didn't before, but it does now (make sure to remove the lock file and bundle again). You should be able to filter your logs with:

The log level is warn. It also logs the output of the error's #inspect.

drale2k commented 3 months ago

I just ran 50 times a full reindex of my model, which has approx 130 db entries - so about 6500 reindexes in total with the new branch and had no issues so far.

However, i ran the same with the release branch again and also did not run into any timeout issues yet. 😐 I will keep trying and report any news

drale2k commented 3 months ago

Actually i think i am not testing the new branch but the current release one. I updated my Gemfile:

gem 'meilisearch-rails', git: 'https://github.com/ellnix/meilisearch-rails.git', branch: 'use-httprb-ms-ruby'

but i still get the httparty error in my console due to the current release version of the meilisearch gem locks it at 0.21.0:

/Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/3.3.0/csv.rb was loaded from the standard library, but will no longer be part of the default gems since Ruby 3.4.0. Add csv to your Gemfile or gemspec. Also contact author of httparty-0.21.0 to add csv into its gemspec.

Looking at the new branch of the meilisearch gem, httparty is no longer a dependecy and thus this error should not be thrown. When i bundle open meilisearch-rails, i correcty see the new branch required in the Gemfile:

gem 'meilisearch', git: 'https://github.com/ellnix/meilisearch-ruby.git', branch: 'switch-to-http-rb'

However, if i bundle open meilisearch, i see that i still have the release version running which uses httparty. I tried to clear my Gemfile.lock, vendor/cache and also gem uninstall meilisearch and bundle install again but same result.

Any ideas how to force it to use the new branch from the meilisearch gem?

ellnix commented 3 months ago

Any ideas how to force it to use the new branch from the meilisearch gem?

It might be an issue of the gemspec dependency mattering over the Gemfile. Looking around it does not look like I can specify a git repository on the gemspec.

Maybe you can specify meilisearch along meilisearch-rails and bundler will pick up the correct versions?

gem 'meilisearch', git: 'https://github.com/ellnix/meilisearch-ruby.git', branch: 'switch-to-http-rb'
gem 'meilisearch-rails', git: 'https://github.com/ellnix/meilisearch-rails.git', branch: 'use-httprb-ms-ruby'
drale2k commented 3 months ago

That worked fine, thanks. I re-ran the my test, re-indexing 50 times a model with 1213 records (about 60k re-indexes) and benchmarked the duration for both, release version and the test branch.

Not a single timeout issue for both tests

release version (using httparty): Duration 113.610023 seconds
test branch (using http gem): Duration 137.300693 seconds

I ran this twice and the duration was approx the same both times. It seems the http gem is about 18% slower than NET::Http for my test.

ellnix commented 3 months ago

Not a single timeout issue for both tests

I remain extremely confused :laughing:

I ran this twice and the duration was approx the same both times. It seems the http gem is about 18% slower than NET::Http for my test.

Can't say for sure but it could easily be my implementation.

brunoocasali commented 3 months ago

Thanks a lot for providing the PoCs @ellnix you rock! And thanks, @drale2k, for helping us hunt this issue down!

But I don't know. Weirdly, only a few people have complained about that, and it is not happening to everybody. So, I always think that it could be something on the host 🤔

@doutatsu let us know if you have time to try it!

brunoocasali commented 3 months ago

@drale2k, how big are your documents? And the issue is happening mostly in the indexing and not in the search time right?

brunoocasali commented 3 months ago

@ellnix I thought we could try something like this: https://github.com/meilisearch/meilisearch-rails/issues/120#issuecomment-1068423334 if @drale2k confirms the documents are "too big" we could think about trying to reduce the average amount of data the ruby process have to handle 🤔.

drale2k commented 3 months ago

@drale2k, how big are your documents? And the issue is happening mostly in the indexing and not in the search time right?

Seems size independent as it happens on a model which has very little data per record, just metadata basically. On top of that, i run into the same issue when trying to delete records from index by running:

Segment.all.each {|r| r.remove_from_index!}. It would run for like 2 minutes, deleting about 2000 (out of 10k) records before i get the timeout error. Then i have to run it again and keep doing that multiple times before all records are deleted from the index.

Sometimes deleting records from the index like that is very slow, not deleting anything for minutes before then continuing to delete just a couple records. I monitor that by refreshing the Meilisearch dashboard. Not sure if that issue is related

drale2k commented 3 months ago

I just ran the remove_from_index loop with the new branch, using http instead of httpartyand i run into the same issue

The command was running for like 10 minutes, only deleted about 10 records in that time and then threw a TimeoutError

irb(main):001> Segment.all.each {|r| r.remove_from_index!}

  Segment Load (29.2ms)  SELECT "segments".* FROM "segments"
(irb):1:in `block in <main>': Timed out after using the allocated 1 seconds (HTTP::TimeoutError)
    from (irb):1:in `<main>'
/Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/timeout/global.rb:36:in `connect_nonblock': read would block (OpenSSL::SSL::SSLErrorWaitReadable)
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/timeout/global.rb:36:in `connect_ssl'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/timeout/null.rb:39:in `start_tls'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/connection.rb:169:in `start_tls'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/connection.rb:45:in `initialize'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/client.rb:70:in `new'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/client.rb:70:in `perform'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/client.rb:31:in `request'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/gems/http-5.2.0/lib/http/chainable.rb:20:in `get'
    from /Users/drale2k/.rbenv/versions/3.3.2/lib/ruby/gems/3.3.0/bundler/gems/meilisearch-ruby-8142e09dfd8e/lib/meilisearch/http_request.rb:13:in `public_send'

My file descriptor limit is at 2560 currently

doutatsu commented 2 months ago

Wonder if this was partly to blame - https://github.com/meilisearch/meilisearch/issues/4654 - will upgrade and see if it reduces number of timeouts or other non 200 status codes

doutatsu commented 2 months ago

Latest update didn't fix it - I am actually having more timeouts nowadays... it's quite frustrating to be honest.

image
curquiza commented 2 months ago

Hello @doutatsu Which version of Meilisearch do you use? Are you on the Cloud? Or self-hosted? If selfhosted, which provider, which machine resources?

Is there anyone else having the timeout error, and being on the Cloud offer?

doutatsu commented 2 months ago

@curquiza

I am self-hosted, deployed to Render with 2GB RAM. It's not a server issue -> I am using a fraction of the CPU and RAM is stable.

image image

I imagine it's simply the issue of the meilisearch itself not being able to handle a lot of requests or this gem not being able to properly queue up requests to be executed. In my case, it's something around 0.38 rps, so doesn't seem that high - and I would imagine meilisearch should be able to handle such a load


I posted before, but here's my configuration again:

MeiliSearch::Rails.configuration = {
  meilisearch_url: ENV.fetch('MEILISEARCH_URL', nil),
  meilisearch_api_key: ENV.fetch('MEILISEARCH_API_KEY', nil),
  timeout: 5,
  max_retries: 2,
  active: !Rails.env.test?
}
curquiza commented 2 months ago

Did you have the issue before v1.8.0? Also, did you try to use --experimental-search-queue-size https://github.com/orgs/meilisearch/discussions/729

doutatsu commented 2 months ago

@curquiza Yup, I had this issue for a looong time. Although I don't remember it being as frequent - but that might also be due to my app becoming more popular and me using Meilisearch more heavily, hard to say

image

I haven't tried that CI flag - I'll try it out

doutatsu commented 2 months ago

Can confirm @curquiza - the flag did nothing