matomo-org / matomo-java-tracker

Official Java implementation of the Matomo Tracking HTTP API.
https://matomo-org.github.io/matomo-java-tracker/
BSD 3-Clause "New" or "Revised" License
69 stars 52 forks source link

Bulk Collection and Sending of Tracking Requests #168

Open dheid opened 11 months ago

dheid commented 11 months ago

Feature Description

Currently, the Matomo Java Tracker sends each tracking request to the Matomo server as soon as it is created. This can lead to a large number of individual requests being sent to the server, especially in high-traffic applications.

I propose adding a feature that allows the tracker to collect multiple tracking requests over a certain delay period and then send them all at once in a bulk request. This could potentially reduce the load on the Matomo server and improve the performance of the tracker.

Proposed Implementation

The tracker could have a configurable delay period (for example, 5 seconds) during which it collects all created tracking requests. At the end of this delay period, it sends all collected requests to the Matomo server in a single bulk request.

This feature could be optional and controlled by a new configuration property (for example, matomo.tracker.bulk-collection-delay). If this property is not set or set to 0, the tracker operates as it currently does, sending each request immediately.

Potential Challenges

One challenge could be ensuring that the tracker correctly handles the case where a new tracking request is created while it is in the middle of sending a bulk request. We would need to make sure that this new request is either included in the current bulk request (if possible) or held for the next bulk request.

Another challenge could be error handling for the bulk request. If the Matomo server returns an error for the bulk request, we would need a way to determine which individual request(s) caused the error.

Impact

This feature could significantly reduce the number of requests that the tracker sends to the Matomo server, potentially improving performance for both the tracker and the server. It could be particularly beneficial for high-traffic applications that generate a large number of tracking requests.

renatocjn commented 7 months ago

Hi, I have been having some issues related to this. I'm guessing that implementing the periodic bulk tracking that you suggest here would solve it.

My app currently calls the bulk submission MatomoTracker::sendBulkRequestAsync to send a set of actions at the same time to the server. The problem I'm having is that when doing these in parallel, the calls block as if the request is being transmitted synchronously.

After debugging a bit, I think the issue is that the Java8Sender being used under the hood has synchronized blocks on the same variable on the function that queues the requests and on the function that transmits the requests (See L332-L341 and L351-L357). I'm using version 3.2.0.

To solve my issue, I implemented the async myself with my own Executor and supplyAsync calls to the sendBulk function. Perhaps a better solution would be to do the sendBulk call from outside the synchronized block on the sender code or use the periodic transmission that you suggest.

dheid commented 7 months ago

Oh, thanks so much! That sounds awesome! I will consider that.

dheid commented 7 months ago

@renatocjn Thanks for your analysis. The synchronization on queries came from the Matomo tracker I once created that was able to collect multiple send executions and create a bulk from them within a configurable delay. I removed that functionality due to the scope of the integration. However I forget to remove the synchronization on the queries.

I removed it until the feature is complete. You'll find a version 3.4.0 that contains a fix for that. No synchronization needed any longer. The bulk collection is not yet implemented in version 3.4.0.