spring-projects / spring-data-couchbase

Provides support to increase developer productivity in Java when using Couchbase. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-couchbase
Apache License 2.0
277 stars 191 forks source link

Await for FTS index to update #1419

Open JesusTheHun opened 2 years ago

JesusTheHun commented 2 years ago

When writing tests that involve the FTS engine, I'm stuck with stone age technique of thread cpu-busy loop to wait the FTS engine to index my documents.

I have tried the following :

int retry = 0;

while (cluster.searchQuery("myIndex", SearchQuery.docId(myDocument.getId())).rows().isEmpty()) {
    if (retry > 60) fail("Document indexation in FTS timeout");
    Thread.sleep(1000);
    retry++;
}

But it gave me inconsistent results. I guess it waits for the index to acknowledge the document but the document is not yet ready to be searched. Sometimes, on a small sized machine overwhelmed by test suites, a dozen of seconds can separate acknowledgement from readiness.

Is there a common wait to deal with this ?

Edit : in my tests I'm using FTS inside a N1QL query, I don't know if that matters.

mikereiche commented 2 years ago

We do the same thing. See runWithRetry()

https://github.com/couchbase/couchbase-jvm-clients/blob/master/java-client/src/integrationTest/java/com/couchbase/client/java/SearchIntegrationTest.java

JesusTheHun commented 2 years ago

This is super weird to not have a way to make sure things are available and ready to be queried. What is even weirder is that even with mutation token, you cannot reach that level of trust in the availability of the document.

So if you batch import documents on behalf of your customer, you message your customer "ok we are ready, you can use the app", and then he logs in and search results are inconsistent. This is super f'ed up. Is there any plan to address that ?

mikereiche commented 2 years ago

I took a look and found this. https://issues.couchbase.com/browse/MB-50101. It will be in the 7.1.0 server release.

I'm not sure if you can access the issue, but it exposes an "FTS endpoint for knowing the index creation status"

They have exposed this endpoint -

curl -u Administrator:password http://10.144.220.101:8094/api/index/myTestIndex
{"status":"ok","indexDef":{"type":"fulltext-index","name":"myTestIndex","uuid":"7809ff9161466cef","sourceType":"gocbcore","sourceName":"my_bucket","sourceUUID":"6f1c001821506954398a1fab0684a40f","planParams":{"maxPartitionsPerPIndex":1024,"indexPartitions":1},"params":{"doc_config":{"docid_prefix_delim":"","docid_regexp":"","mode":"type_field","type_field":"type"},"mapping":{"analysis":{},"default_analyzer":"standard","default_datetime_parser":"dateTimeOptional","default_field":"_all","default_mapping":{"dynamic":true,"enabled":true},"default_type":"_default","docvalues_dynamic":false,"index_dynamic":true,"store_dynamic":false,"type_field":"_type"},"store":{"indexType":"scorch","segmentVersion":15}},"sourceParams":{}},"planPIndexes":[{"name":"myTestIndex_7809ff9161466cef_4c1c5584","uuid":"188b5a0f27926e97","indexType":"fulltext-index","indexName":"myTestIndex","indexUUID":"7809ff9161466cef","sourceType":"gocbcore","sourceName":"my_bucket","sourceUUID":"6f1c001821506954398a1fab0684a40f","sourcePartitions":"0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023","nodes":{"3fa3ec262f02a80b59ce58e77e781689":{"canRead":true,"canWrite":true,"priority":0}},"indexParams":{"doc_config":{"docid_prefix_delim":"","docid_regexp":"","mode":"type_field","type_field":"type"},"mapping":{"analysis":{},"default_analyzer":"standard","default_datetime_parser":"dateTimeOptional","default_field":"_all","default_mapping":{"dynamic":true,"enabled":true},"default_type":"_default","docvalues_dynamic":false,"index_dynamic":true,"store_dynamic":false,"type_field":"_type"},"store":{"indexType":"scorch","segmentVersion":15}}}],"warnings":[]}
JesusTheHun commented 2 years ago

This is related to index creation. My index created and ready before my tests start. It's about a document being ready to be queries.

mikereiche commented 2 years ago

Use this url curl -u <username>:<password> http://<ip>:8094/api/nsstats

check for bucket_name:index_name:num_mutations_to_index = 0. In the unit tests, we use okhttp client for things like this.

{
  "batch_bytes_added": 13198,
  "batch_bytes_removed": 13198,
  "curr_batches_blocked_by_herder": 0,
  "my_bucket:myTestIndex:avg_grpc_internal_queries_latency": 0,
.
.
.
  "my_bucket:myTestIndex:num_files_on_disk": 4,
  "my_bucket:myTestIndex:num_mutations_to_index": 0,
mikereiche commented 2 years ago

The FTS folks pointed me to https://docs.couchbase.com/server/current/fts/fts-search-response.html#at_plus It's not clear to me if the scan_vector is required or not.

daschl commented 2 years ago

@mikereiche yeah the scan_vector is required with at_plus - that's the MutationToken you pass in from the SDK.

mikereiche commented 2 years ago

so the at_plus with the scan_vector would only work for documents that the client itself had inserted. It could work for the unit test case, but would not work for something like a daily bulk-load of documents.

daschl commented 2 years ago

@mikereiche there is a way to retrieve it for documents, but right now that is not exposed to the user since we only wanted to cover the RYOW use case.

JesusTheHun commented 2 years ago

We go a bit beyond the initial scope of this issue but let's dive into it.

@daschl I have a import service that insert all docs with data from a third-party, I would like it to be able to post a message in the broker saying "hey the import is ready, tell the customer". Typical batch size is a few thousands documents. I could make a dumb request with the at_plus consistency and once it returns I post my message. This requires the latest token from the last insert ?

@mikereiche how do I get the mutation token of an insert from a spring repository ? We talked about it a few months ago I don't know if you moved forward with that. You talked about a @MutationToken annotation, like the @Version one.

mikereiche commented 2 years ago

fts folks also suggested /api/stats/index//progress

mikereiche commented 2 years ago

"This requires the latest token from the last insert ?" I don't believe the order that documents are inserted can be inferred from the order they were inserted. And even if they could, if the inserts are done with the reactive API, there's no way to know the order the inserts actually occurred. One could look at the CAS, but then again, that's another assumption.

As I mentioned earlier, there is the "num_mutations_to_index". That will be zero when all mutations have been indexed. Is that not sufficient?

mikereiche commented 7 months ago

for Query :

curl -u Administrator:password http://localhost:9102/api/stats/my_bucket/adv_name | jq .         
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   565  100   565    0     0  39724      0 --:--:-- --:--:-- --:--:-- 40357
{
  "my_bucket:adv_name": {
    "avg_drain_rate": 0,
    "avg_item_size": 0,
    "avg_scan_latency": 0,
    "cache_hit_percent": 0,
    "cache_hits": 0,
    "cache_misses": 0,
    "data_size": 7296,
    "disk_size": 49152,
    "frag_percent": 33,
    "initial_build_progress": 100,
    "items_count": 0,
    "last_known_scan_time": 0,
    "memory_used": 208,
    "num_docs_indexed": 0,
    "num_docs_pending": 0,
    "num_docs_queued": 0,
    "num_items_flushed": 0,
    "num_pending_requests": 0,
    "num_requests": 0,
    "num_rows_returned": 0,
    "num_scan_errors": 0,
    "num_scan_timeouts": 0,
    "recs_in_mem": 0,
    "recs_on_disk": 0,
    "resident_percent": 0,
    "scan_bytes_read": 0,
    "total_scan_duration": 0
  }
}
mikereiche commented 7 months ago

While couchbase needs the MutationState/Token returned from a mutation to use as an input to for queries/searches to ensure that indexing is complete - there is no Spring Data api that returns the MutationState/Token - as Spring Data apis are implementation-independent.