oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.29k stars 739 forks source link

Opengrok web application isn't accessible during reindex #4527

Open MasterAwesome opened 5 months ago

MasterAwesome commented 5 months ago

Hello,

I'm running an opengrok server which is set to run opengrok-indexer periodically using cron. I notice that when the indexer is running and the output is Sending configuration to: http://localhost:8080/source the web application is non responsive until this configuration reindex is done. This takes about 5mins as my codebase is big. Is there another way to reindex while still keeping the webapp without degradation?

vladak commented 5 months ago

Sounds like a bug. There is probably something in the setConfiguration() code path that acquires a lock used by the request processing.

One way to avoid this is to use the per project workflow which avoids uploading the configuration to the webapp.

MasterAwesome commented 4 months ago

@vladak looks like with 1.13.2 this behavior is a little different. The webapp allows the source to be browsed although the search, history and any other functionality which requires some sort of querying doesn't respond until indexing is done.

Is it expected for Sending configuration to: http://localhost:8080/source to take so long (~5minutes for a ~3.5GB project even when there are no source changes)? It's also confusing since the configuration is only 8KB @ /opengrok/etc/configuration.xml.

My indexing command is: opengrok-indexer -a /opengrok/dist/lib/opengrok.jar -- -s /opengrok/src -d /opengrok/data -H -P -S -G -W /opengrok/etc/configuration.xml -U "http://localhost:8080/source

I also couldn't get per project setup to work without the same issue either, could you elaborate on how the setup would look?

vladak commented 4 months ago

The configuration processing in the webapp might be held by invalidating repositories. Could you take a look at the webapp logs while the configuration upload is in progress to see if that is the case ? Ideally also capture the thread stacks at that point using jstack.

The point of per project workflow is that per project reindex does not send the complete configuration at the end of the reindex: https://github.com/oracle/opengrok/blob/500e28cafa709cf32662d77de21cdc77287291b4/opengrok-indexer/src/main/java/org/opengrok/indexer/index/Indexer.java#L421-L424

It merely pokes the webapp to bump the index reader for given project.

MasterAwesome commented 4 months ago

I continuously get the same log across multiple threads almost exactly 1 second apart FINEST [http-nio-8080-exec-<tid>] org.opengrok.web.api.v1.filter.IncomingFilter.filter allow request to status/<same-uuid-with-every-request> based on localhost IP address.

When searching a query I see FINEST [http-nio-8080-exec-15] org.opengrok.web.api.v1.suggester.parser.SuggesterQueryDataParser.processQuery Processing suggester query: hello at 5 and jstack suggests this is due to the Suggester.rebuild() holding a lock and the search uses SuggesterServiceImpl.getSuggestions and SuggesterServiceImpl.onSearch() waiting for that lock during indexing.

EDIT: Disabling suggester works fine, sending config finishes in sub second time. Looks like suggester should be temporarily turned off during it's rebuilding to avoid this denial of service.

vladak commented 4 months ago

It may be, and this is purely a speculation at this point, that at the time the configuration is sent to the webapp, there is already a sequence of suggester rebuild jobs waiting in the queue and it somehow blocks the request. The rebuild() method holds a R/W lock for writing so this would block any suggester read requests. The just released 1.13.3 improves suggester rebuild request handling so this might provide some relief, however does not solve the problem.

MasterAwesome commented 4 months ago

I see, I can reproduce it 100% of the time, so this waiting for rebuild thread is either triggering all the time or indexing somehow spawns it and takes a very very long time to do so as well.

Probably interesting to look into how we can optimize suggester rebuilds especially when we have history information and we can granularly build it. Or maybe disable the feature entirely when rebuild is on.

EDIT: 1.13.3 has the same issue and reproduces the same way. The performance of the rebuilds seems to be similar wallclock time.

vladak commented 4 months ago

Could you take the stack trace snapshot using jstack on the Tomcat process when this issue occurs and there is a pending request to the webapp that is stuck ? There should be a better way to fix this than disabling suggester.

vladak commented 3 months ago

There are similar issues - #3516 and #3468