googleapis / nodejs-firestore

Node.js client for Google Cloud Firestore: a NoSQL document database built for automatic scaling, high performance, and ease of application development.
https://cloud.google.com/firestore/
Apache License 2.0
640 stars 149 forks source link

[Firestore] Throwing 14 UNAVAILABLE: Query timed out on "simple" query #2055

Open abarone-btf opened 3 months ago

abarone-btf commented 3 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

1) Is this a client library issue or a product issue? Yes its the client library

2) Did someone already solve this?

3) Do you have a support contract? No

Environment details

Steps to reproduce

  1. Am just making a "simple" query that uses the OR operator and the new inequality operator on multiple fields, the query is searching over an Id that its repeated on my database with aprox 27k items:

This is an example of the query:

            .where('myId', '==', 123)
            .where('createdAt', '<=', new Date())
            .where(
                Filter.or(
                    Filter.where('deletedAt', '>=', new Date()),
                    Filter.where('deletedAt', '==', null)
                  )
            )
            .orderBy('createdAt', 'desc')
            .limit(500)
  1. The query just hangs out and never finished, throwing the error 14 UNAVAILABLE: Query timed out. Please try either limiting the entities scanned, or run with an updated index configuration..

What may be happening here?, maybe am missing some weird index?, I tried to run the explain function but with the analyze flag the result is the same, without it, the plan summary its this one:

 PlanSummary {
  indexesUsed: [
    {
      properties: '(myId ASC, createdAt DESC, deletedAt DESC, __name__ DESC)',
      query_scope: 'Collection'
    },
    {
      properties: '(myId ASC, deletedAt ASC, createdAt DESC, __name__ DESC)',
      query_scope: 'Collection'
    }
  ]
}

Thanks in advance!

dconeybe commented 3 months ago

Hi @abarone-btf. Thank you for your question. While I look into this, could you try different permutations of your query to see which clause/clauses seem to be the culprit?

Here are some permutations to try, and report back which ones result in the UNAVAILABLE error and which ones succeed:

  1. Remove the limit
  2. Remove the orderBy
  3. Remove the "or" clause
  4. Remove the "createdAt" inequality

Thank you.

abarone-btf commented 3 months ago

Hi @dconeybe !, of course, this are the results for each of the steps (I am applying this individually, starting each remove from the base query described above, not a permutation of removes):

One thing I tried that "works" is applying an order by deletedAt first (asc), then by createdAt, the query runs smoothly without problems (maybe because the data has many null values in that field, in contrast createdAt has a lot more cardinality), also in that example the plan summary uses a single index, in contrast with the base query that uses 2 "almost identical" indexes.

In my case that re orderding does not work because I might miss some entries in the pagination flow, but its worth mention it.

Thanks again for your help.

dconeybe commented 3 months ago

Would you be willing to provide your Firebase project number so we can check the backend logs?

You can find your project number in the Firebase Console (screenshot)

abarone-btf commented 3 months ago

Of course, this is our project number 783579267042

dconeybe commented 3 months ago

Thank you @abarone-btf. The backend engineers have investigated and found a potential query planner bug in the backend (Googlers see b/346321260). I'll keep you posted on developments as investigation continues.

One question though... you mentioned in the OP that

the query is searching over an Id that its repeated on my database with aprox 27k items

When you say "aprox 27k items" is "27k" referring to the number of documents that match the query, or the total number of documents in the collection? If the former, can you give an estimate on the total number of documents in the collection?

abarone-btf commented 3 months ago

Awesome @dconeybe!!, yes the 27k elements means that there are aprox 27k documents with myId = 123 (for the record, when this number is "low" the query ends without problem)

dconeybe commented 3 months ago

Do you know, roughly, how many documents are in the entire collection (i.e. with any value fo myId)?

abarone-btf commented 3 months ago

Yes!, in the collection there are aprox 200k elements

abarone-btf commented 3 months ago

One question, what does it mean "Googlers see b/346321260"?

dconeybe commented 3 months ago

@abarone-btf Could you please open a support ticket at https://firebase.google.com/support/troubleshooter/contact for this issue? We need an official support ticket to provide justification to look at your project in more detail. Be sure to include the URL of this github issue in your support ticket and also mention that b/346321260 as the internal ticket that has been logged for this issue.

One more request: Could you try removing the Filter.where('deletedAt', '==', null) clause and see if the query runs to completion?

"Googlers see b/346321260" is a note for Google engineers, mentioning the internal ticket that is tracking this issue, b/346321260.

abarone-btf commented 3 months ago

Awesome @dconeybe !, I created the ticket (screenshot below).

image

In relation to the test, without Filter.where('deletedAt', '==', null) the query returns sucessfully and without Filter.where('deletedAt', '>=', new Date()), (leaving the == null) it also returns sucessfully.

Thanks again!

xwkuang5 commented 3 months ago

@abarone-btf could you let us know the case number (if any)? Also, if you are comfortable, it would help if you can provide the un-redacted query in the support case.

xwkuang5 commented 3 months ago

While we wait for the support ticket to be created and surfaced, can you also help us populate the table below by testing on your end (also correct me if I record the behavior incorrectly)?

query behavior
where('myId', '==', 123).where('createdAt', '<=', new Date()).where(or(where('deletedAt', '>=', new Date()), where('deletedAt', '==', null))).orderBy('createdAt', DESC).limit(500) timeout
where('myId', '==', 123).where(or(where('deletedAt', '>=', new Date()), where('deletedAt', '==', null))).orderBy('createdAt', DESC).limit(500) timeout
where('myId', '==', 123).where('createdAt', '<=', new Date()).where('deletedAt', '>=', new Date()).orderBy('createdAt', DESC).limit(500) succeed
where('myId', '==', 123).where('createdAt', '<=', new Date()).where('deletedAt', '==', NULL).orderBy('createdAt', DESC).limit(500) succeed
where('myId', '==', 123).where('createdAt', '<=', new Date()).where('deletedAt', '==', NULL).orderBy('createdAt', DESC).orderBy('deletedAt', DESC).limit(500) ?
where('myId', '==', 123).where(or(where('deletedAt', '>=', new Date()), where('deletedAt', '==', null))).orderBy('deletedAt', DESC).limit(500) ?
abarone-btf commented 3 months ago

Hi @xwkuang5 !, of course the ticket numer is 10287347, about the un-redacted query, how do I send that?, as a reply on the mail?.

Related to the table, both at the end works good. Also there is one missing on the base query: Without orderBy: Needed a new index, after creation the result is UNAVAILABLE error ❌.

clairekeer1997 commented 3 months ago
Hi @abarone-btf , could you please help us confirm that by running the following queries, if the number of results returned on your end matches below? These are the numbers we observed from our logs. Query Num of results
where('myId', '==', 123).where('createdAt', '<=', new Date()).where('deletedAt', '>=', new Date()).orderBy('createdAt', DESC).limit(500) 0
where('myId', '==', 123).where('createdAt', '<=', new Date()).where('deletedAt', '==', NULL).orderBy('createdAt', DESC).limit(500) 500
abarone-btf commented 3 months ago

Hi!, @clairekeer1997 . Yes, thats exactly the result!

dconeybe commented 3 months ago

@abarone-btf, could you enable debug logging and re-run the problematic query from your opening comment? The debug logs will help the investigation. If you don't feel comfortable sharing the logs publicly here in the github issue, feel free to add them to the support ticket, where only Google will have visibility. If you do feel comfortable sharing them publicly, please create a gist at https://gist.github.com/ and share the link to the gist here.

To enable debug logging, call

setLogFunction(console.log);

https://googleapis.dev/nodejs/firestore/latest/global.html#setLogFunction

abarone-btf commented 3 months ago

Hi @dconeybe of course!, but how do I add them to the support ticket?

abarone-btf commented 3 months ago

@dconeybe I read that replying to the mail is the way to go!. I reply to it with the gist url 👌 .

If you can please confirm that you receive it that would be awesome.

Thanks!

dconeybe commented 3 months ago

We have received the debug logs and have been able to reproduce the issue. This is great news because it means we can investigate and test out fixes. We will update on the progress by commenting on this GitHub issue.

abarone-btf commented 3 months ago

Awesome @dconeybe !!. Is there some light at the end of the tunnel? 👀 jajaja.

Anything I can help you with, please tell me.

Thanks!

dconeybe commented 3 months ago

Yes, a fix has been submitted and we need to roll it out to the backend servers. The ETA for the fix to actually be fully rolled out is 2-4 weeks. Thanks again for all your help diagnosing this!

dconeybe commented 2 months ago

Update: Unfortunately, rollouts have not gone as smoothly as hoped. It will likely be another 2-4 weeks (some time in August 2024) when the fix will finally roll out.

brent-ridian commented 2 months ago

dconeybe: I appreciate your support for this issue and consistently following up.

Is the issue in this thread caused by the same bug that causes the issue mentioned here: https://stackoverflow.com/questions/78605344/how-to-fix-grpc-channel-multithreadedrendezvous-fails-statuscode-unavailable ?

dconeybe commented 2 months ago

Is the issue in this thread caused by the same bug that causes the issue mentioned here: https://stackoverflow.com/questions/78605344/how-to-fix-grpc-channel-multithreadedrendezvous-fails-statuscode-unavailable ?

I don't think that stackoverflow post is the same issue. The fix for this issue is related to how "or" queries are planned and executed, but that stackoverflow post doesn't seem to be performing an "or" query. I'd suggest logging an issue against https://github.com/googleapis/python-firestore/ since that looks like a python-specific bug (the team that maintains nodejs-firestore is different than the team that maintains python-firestore). It would be interesting, though, if using the CollectionReference.stream() method of nodejs-firestore works in this specific case that is problematic with python. If you can reproduce the same issue in node then feel free to open a new issue against this github repository for it.

xwkuang5 commented 1 month ago

@abarone-btf the fix has been rolled out for some time now, can you confirm if the issue has been fixed for you? Thank you!