This PR addresses performance issues from #287 and #285
Identified problems
vocabulary validation - was being executed on a request from the frontend. Results were cached, but the whole cache was evicted on any modification of any vocabulary. Vocabulary validation also requires synchronization as the validator is not thread-safe, blocking each thread assigned to handle the validation request until the previous validation is performed (This was probably also preventing the app from running out of memory when multiple validations would be processed concurrently). Validation is a time and resource-consuming task.
asynchronous term occurrence saving - Many individual tasks might get scheduled during occurrence saving, but it's not even guaranteed when they will be saved; it could get to the point where the app was doing nothing but catching up on occurrence saving.
text analysis - a similar problem as with validation, it's executed on term modification (or by a user on a file, for example); it is time and resource-consuming, and it results in async term occurrence saving. Text analysis of a term was executed before text analysis of all terms from the vocabulary, resulting in duplicated processing.
Single code deployment - TermIt is deployed in a single core environment, eliminating the asynchronous processing benefits.
New features
Throttle & Debounce
This PR introduces an option to throttle and debounce method calls.
Do not review linked test cases before reading about return type support and throttled futures
The goal of method throttling & debouncing is to execute tasks asynchronously on a fixed thread pool merging often method calls into a single task execution with the newest data from the most recent method call [test case]. Throttling ensures that if the method task were not executed in the last X seconds, it would be scheduled for immediate execution [test case]. Otherwise, its execution will be delayed (debounced) so that it can be merged with potential future calls (it guarantees that when no future call comes, the task will be executed with the data from the last call). Task execution also ensures that when a task is time-consuming and its execution is taking longer than the actual throttle interval, a new call to the throttled method won't result in the concurrent execution of the same task [test case].
When a thread is already executing a throttled task, it will ignore any further throttling and will execute all methods synchronously [test case].
Throttling & Debouncing is realized with the Throttle annotation which is handled by the Throttle aspect.
Aspect is configured using Spring AOC XML syntax to not utilize AspectJ. Once AspectJ is removed from dependencies, it should be possible to replace XML configuration with annotations. Aspect is disabled for the test profile.
Throttle annotation supports methods with void return type out of the box.
The whole method is, in that case, considered a task that should be throttled, and the method itself will be executed asynchronously.
There is also support for methods returning a Future. However, the concrete returned object MUST be ThrottledFuture. Otherwise the aspect will throw appropriate exception on the method call (there is no way to safely check that on application start).
When a method returns future (ThrottledFuture), the method itself will be executed synchronously allowing to prepare the task that should be throttled and also to provide a cached result which may be acquired by a caller method from CacheableFuture interface before the actual future resolution. The actual task and cached result is then provided through the returned ThrottledFuture object.
An example can be seen in updated result caching validator where the method validate will be executed synchronously by the caller thread, checking the cache state and returning already resolved future when the cache is not dirty, or returning the future with the time-consuming task runValidation method and providing the cached result. Method runValidation will be executed asynchronously.
Throttled future also implements a chainable future interface, which allows to chain a task that will be executed once the future is resolved.
This, for example, allows the WebSocket controller to respond with the cached result and set a task that will send a new result to the client once new data are available. This prevents the thread from being blocked while awaiting a future resolution.
Scheduling throttled futures also support their cancellation based on their group.
This, for example, allows to cancel scheduling a task to analyze a definition of a single term while an analysis of all terms from the vocabulary is scheduled.
Disadvantages
a throttled method can't use the previous transaction - when the throttled method is called during a transaction, the throttled task is executed asynchronously without access to the original transaction. However, when the Transactional annotation is present on the same method as the Throttled annotation, the task will be executed in transactional context.
Unfortunately, I was not able to make a detection of active transaction context work. It might be a feature missing in Jopa (and TransactionSynchronizationManager), or I might just miss something; anyway, the explicit transactional annotation is required for the transaction to work.
Long-running tasks
As the application will now run some time-consuming tasks in the background, it will push the status of such tasks to the clients via WebSocket, allowing to display information about the activity to the user.
Currently, it's only possible to name the throttled method by a constant. So, the user will know that there is a validation in progress but won't know which vocabulary is being validated. This might be changed by adding a new parameter for additional information (in addition to the name parameter).
Changes
Removed periodic task for clearing context scheduled for removal (in case of term occurrence in definitions, removal is fast and will be made synchronously, for files see next point)
Concurrent saving and resolving of term occurrences for file analysis. A second thread will be started, one thread will start resolving occurrences, while the second thread will execute the removal of all current ones; after successful removal, it will start to save resolved occurrences from the first thread.
Vocabulary validation will be performed asynchronously using throttling.
New vocabulary validation results will be pushed to clients via WebSocket.
Text analysis will be performed asynchronously using throttling.
Clients will be notified about the end of text analysis
The frontend is no longer in control when validation and text analysis are performed (unless triggered by an explicit button by a user). The backend will automatically execute these tasks when appropriate (when vocabulary is modified, etc.).
The result caching validator was rewritten to provide dirty cached results through throttling and only mark the cache as dirty instead of deleting it. It will also mark only related cache as dirty and won't touch other non-related entries.
Removed the option to disable vocabulary analysis as discussed in a meeting
Requirements and notes
It is expected that TermIt will be deployed in an environment with at least two cores available to benefit from asynchronous processing (more cores would be, of course, beneficial as we need to handle http, websocket, database and background tasks).
The annotation is not prepared for AOT, and a reflection processor registering runtime hints will need to be probably created if the support for AOT is added.
This PR addresses performance issues from #287 and #285
Identified problems
New features
Throttle & Debounce
This PR introduces an option to throttle and debounce method calls.
The goal of method throttling & debouncing is to execute tasks asynchronously on a fixed thread pool merging often method calls into a single task execution with the newest data from the most recent method call [test case]. Throttling ensures that if the method task were not executed in the last X seconds, it would be scheduled for immediate execution [test case]. Otherwise, its execution will be delayed (debounced) so that it can be merged with potential future calls (it guarantees that when no future call comes, the task will be executed with the data from the last call). Task execution also ensures that when a task is time-consuming and its execution is taking longer than the actual throttle interval, a new call to the throttled method won't result in the concurrent execution of the same task [test case]. When a thread is already executing a throttled task, it will ignore any further throttling and will execute all methods synchronously [test case].
Throttling & Debouncing is realized with the Throttle annotation which is handled by the Throttle aspect.
Throttle annotation supports methods with void return type out of the box. The whole method is, in that case, considered a task that should be throttled, and the method itself will be executed asynchronously.
There is also support for methods returning a Future. However, the concrete returned object MUST be ThrottledFuture. Otherwise the aspect will throw appropriate exception on the method call (there is no way to safely check that on application start). When a method returns future (ThrottledFuture), the method itself will be executed synchronously allowing to prepare the task that should be throttled and also to provide a cached result which may be acquired by a caller method from CacheableFuture interface before the actual future resolution. The actual task and cached result is then provided through the returned ThrottledFuture object. An example can be seen in updated result caching validator where the method
validate
will be executed synchronously by the caller thread, checking the cache state and returning already resolved future when the cache is not dirty, or returning the future with the time-consuming taskrunValidation
method and providing the cached result. MethodrunValidation
will be executed asynchronously.Throttled future also implements a chainable future interface, which allows to chain a task that will be executed once the future is resolved. This, for example, allows the WebSocket controller to respond with the cached result and set a task that will send a new result to the client once new data are available. This prevents the thread from being blocked while awaiting a future resolution.
Scheduling throttled futures also support their cancellation based on their group.
This, for example, allows to cancel scheduling a task to analyze a definition of a single term while an analysis of all terms from the vocabulary is scheduled.
Disadvantages
Long-running tasks
As the application will now run some time-consuming tasks in the background, it will push the status of such tasks to the clients via WebSocket, allowing to display information about the activity to the user.
Currently, it's only possible to name the throttled method by a constant. So, the user will know that there is a validation in progress but won't know which vocabulary is being validated. This might be changed by adding a new parameter for additional information (in addition to the name parameter).
Changes
Requirements and notes
It is expected that TermIt will be deployed in an environment with at least two cores available to benefit from asynchronous processing (more cores would be, of course, beneficial as we need to handle http, websocket, database and background tasks).
The annotation is not prepared for AOT, and a reflection processor registering runtime hints will need to be probably created if the support for AOT is added.