Open ghost opened 3 years ago
How do you run the indexer ? Do you have projects enabled ?
Both web UI and API went to the same OpenGrok instance, and using the same account. All projects were included in the search. So, this issue should have nothing to do with indexer.
There is #3170, that's why I am asking about projects and indexer.
2020-09-25 08:38:15.698+0000 INFO t1 Indexer.parseOptions: Indexer options: [
-v, --displayRepositories, off, --optimize, on, -r, uionly, -H, -S, --depth, 99, --progress, -c, /usr/bin/ctags, -o, /var/opengrok/conf/ctags/config, -m, 256, --leadingWildCards, on, -R, configuration.ro.xml, -W, configuration.xml, -P, -U, http://localhost:9080/vanilla_android, -s, /var/opengrok/stage1/src, -d, /var/opengrok/stage1/data
]
I got more results from API than web UI.
Tried to replicate this with 1.12.28 using AOSP source code and fulltext searching for 'google' (http://localhost:8080/source/api/v1/search?projects=AOSP&full=google&maxresults=200000). Using the API I got "resultCount":41556
, and using the web UI I got way less - several thousands of results as reported by the webapp. Interestingly when I refreshed the first result page, the result count was almost always different. It seems to me as if it is cycling though a small set of numbers. Even more surprising was clicking through the various result pages - progressing through results pages 1, 2, 3, ... etc. the total number of results reported with each ascending page number was higher. The last page of the results, page 3810 reported 95241 of total results. On the last page the total number of results did not change when the page was refreshed. Based on this experience, I tried the API call multiple times to see if it will change, however it remained the same.
There is quite a difference how the search is done between web UI and the API. In API, the SearchController
in the end uses the SearchEngine
class (via the SearchEngineWrapper
subclass of the SearchController
class) . This class grabs the IndexSearcher
(Lucene) using https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java#L181 (where SuperIndexSearcher
is a super class wrapping IndexSearcher
for the purpose of "bumping" the related IndexReader
after reindex so that newly indexed data can be displayed in search results) or https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java#L202-L203 for project-less and project searches, respectively. The difference is that while in project-less mode the IndexSearcher
is reused, with projects it is created from scratch. The query is created from the API arguments using https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java#L154-L160. The search results are collected using TopScoreDocCollector
(Lucene). The results are then processed by SearchEngine#results()
that can actually perform re-query, i.e. perform the search once again. This is also where any context is fetched from the index and source and added to the Hit
objects that are then returned in a list. The search count comes from the hits
length. The hits
object is acquired here: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java#L219
The web UI uses the SearchHelper
class like so: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-web/src/main/webapp/search.jsp#L86. The IndexSearcher
is acquired in SearchHelper#prepareExec()
: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/web/SearchHelper.java#L400-L402 and then used in executeQuery()
: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/web/SearchHelper.java#L478-L479. The collected and summarized results are then embedded to the page: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-web/src/main/webapp/search.jsp#L227-L228 aggregated by directory: https://github.com/oracle/opengrok/blob/b4a9940090f2c2cb8e8db97f6e7ca901455c6ed1/opengrok-indexer/src/main/java/org/opengrok/indexer/search/Results.java#L109-L110. The number of results reported near the top of the page comes from the totalHits
field as visible above. Compared to how the hits are extracted for the API in the SearchEngine
, there is no collector involved.
The API uses Lucene's public void search(Query query, Collector results)
while the web UI uses public TopFieldDocs search(Query query, int n, Sort sort)
.
Describe the bug The REST API (api/v1/search) returns different results from Web UI for the same query condition.
Environments:
To Reproduce Steps to reproduce the behavior: Searching from GUI, gets "Searched +full:google +refs:google (Results 25801 – 25802 of 25802) sorted by relevance" But searching from REST API gets
Expected behavior Web UI and API should return same results for the same search condition.