oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.36k stars 747 forks source link

parallelize repository incoming check and sync #4103

Open vladak opened 1 year ago

vladak commented 1 year ago

Observing a opengrok-mirror run for a bunch of non-local Mercurial repositories, with the -I option, I noticed that each repository takes couple of seconds to check (via repo.incoming()). Since the list of repositories is known beforehand in utils/mirror.py#process_changes(), this piece of code could be parallelized: https://github.com/oracle/opengrok/blob/c10182859ee0b2d541135b28e90df09aca1a13d7/tools/src/main/python/opengrok_tools/utils/mirror.py#L324-L330

The top-level repo check needs some thought, though. Also, will need to take care of error reporting. There is no exception (re)thrown, however will need to make sure that errors are properly returned as FAILURE_EXITVAL.

Lastly, will need to determine the parallelism level.

Similarly, same should be done for the repository synchronization part.

vladak commented 1 year ago

Of course, the main trouble is to figure out what to do w.r.t. logging.

vladak commented 1 year ago

opengrok-mirror is already parallelized, however at the project level. It might make sense to change this parallelism to repository level, i.e. assemble repositories of all projects and then submit them to the workers.