rnewson / couchdb-lucene

Enables full-text searching of CouchDB documents using Lucene
Apache License 2.0
768 stars 147 forks source link

Search time out after sync local database (PouchDB) with the server #264

Open juliobetta opened 6 years ago

juliobetta commented 6 years ago

I'm using CouchDB 2.0, and I'm calling the search directly from couchdb-lucene url.

curl http://couchdb_lucene:5985/local/personalservice%24test/_design/search/by_patient?q=name:bruna
=> { code: 500 }
screenshot

The timeout happens after I synchronize a local database (PouchDB) with the server.

But it works when I run the info url.

curl http://couchdb_lucene:5985/local/personalservice%24test/_design/search/by_patient

{
  "current":true,
  "doc_count":7582,
  "digest":"dxedi0wt8831hv78xfneis12q",
  "update_seq":"7591-g1AAAAI7eJzLYWBg4MhgTmHgzcvPy09JdcjLz8gvLskBCjMlMiTJ____PyuDOYmBgXl5LlCM3SzF3Mgs0RRdPQ4TkhSAZJI9wpAzYENMk01NjSyNiTXEAWRIPMKQGWBDUo0MzC0NE4k1JAFkSD26IZbG5uZJZkZEGpLHAiQZGoAU0Jz5UINWgw1KMjC0MDclNlwgBi2AGLQfYhALK8RbKeamJokpJBl0AGLQfaiLtkMCOSUVGECkee0BxCBYGM0FG5Roam5mium1LAAd8KtG",
  "disk_size":706686,
  "doc_del_count":0,
  "fields":[
    "name"
  ],
  "ref_count":2,
  "uuid":"2a750a50-eed3-4d2e-a6d3-fc0c533ee335",
  "version":5
}

I've made an experience replicating the database in the server, and the issue is gone. I'm trying to understand why the timeout happens only after a synchronization... 🤔

    {
        "q":"name:bruna",
        "fetch_duration":10,
        "total_rows":2,
        "limit":25,
        "search_duration":23,
        "etag":"24bed27d9f56",
        "skip":0,
        "rows":[
            {
            "score":4.144786834716797,
            "id":"patients_01C83AAFYM87ZRV42NYXXGWAXK",
            "fields":{
                "name":"Bruna Barros"
            }
            },
            {
            "score":4.144786834716797,
            "id":"patients_01C83AAEZ5YMSG8CKB84MM6K99",
            "fields":{
                "name":"Bruna Moreira"
            }
            }
        ]
    }
juliobetta commented 6 years ago

I've solved this issue by removing ?limit=0&descending=true from the line below.

https://github.com/rnewson/couchdb-lucene/blob/5106b033ebfb248b9274eca1f4176cdb2e3d312b/src/main/java/com/github/rnewson/couchdb/lucene/couchdb/Database.java#L102

The timeout was happening because the last_seq returned from this method was always different from update_seq returned by <couchd_url>/<database_name>. I don't know the side effects of this change, so I didn't send a PR. Maybe you have a better solutions for this...

juliobetta commented 6 years ago

Now I know the side effects... It degrades performance on large databases because it needs to load the entire _charges to get last_seq... 😩

rnewson commented 6 years ago

sorry for not noticing this earlier. the update_seq strings aren't easily compared but couchdb-lucene should be building incrementally, it doesn't start from 0 each time. Initial search index build can take time, though.