Continuous results uploading don't record the total of the performance test carried out

ludeknovy / jtl-reporter

JtlReporter is an online application that allows users to generate beautiful, customizable and easy to understand performance reports from JMeter(Taurus), Locust, and other tools.

https://jtlreporter.site

MIT License

122 stars 32 forks source link

Continuous results uploading don't record the total of the performance test carried out #281

Open jorgetamayo21 opened 1 month ago

jorgetamayo21 commented 1 month ago

Describe the bug

I did two distributed performance tests with Locust.io.

I used 20 virtual machines with 16 virtual CPUs (8 cores) and 65 GB of ram, 8 locust workers by virtual machine, reaching 20 thousand threads and approximately 7300 req/s.

I used one virtual machine with 8 virtual CPUs (4 cores) and 32 GB of ram for JTL-Reporter

Test results are sent in real time to both Grafana (from locust-plugins) and JTL-Reporter using the "Continuous results uploading" method.

The first test was saved fine and the start and end of the test match what was recorded by Grafana.

JTL-Reporter: imagen Grafana: imagen

However, in the second test JTL-reporter tells me that it finished at 08:03 while in Grafana I have results until 08:14

JTL-Reporter: imagen Grafana imagen

Expected behavior Let JTL-Reporter record the total of the performance test carried out

Thank you very much in advance, beautiful tool, we are very happy using it.

ludeknovy commented 1 month ago

Hi @jorgetamayo21 Please check this method: https://github.com/ludeknovy/jtl-reporter/blob/921f314b5038a107971596d6a7fd4e10669cb23f/scripts/jtl_listener_service.py#L178

And also check the logs of the run when some of the samples were not reported. My guess is, that it got terminated before all the samples were uploaded to the listener service.

jorgetamayo21 commented 1 month ago

Indeed, 5103684 not uploaded samples were recorded in the log.

imagen

ludeknovy commented 1 month ago

Alright. Maybe try to remove the timeouts from the join https://www.gevent.org/api/gevent.greenlet.html#gevent.Greenlet.join in theory it should wait for the background task to finish

ludeknovy commented 1 month ago

Also, be sure that the DB and listener service have enough resources to handle the load.

jorgetamayo21 commented 1 month ago

I left them all without timeout.

imagen

Additionally, I doubled the resources of the VM for JTL-Reporter, leaving it with 16 virtual CPUs (8 cores) and 64 GB of ram.

On Wednesday we will do the same test and I will tell you if it worked to close the issue. Thank you so much!

jorgetamayo21 commented 1 month ago

I just finished a test of more than 3 hours, 17 thousand threads, generating approximately 53 million requests.

imagen

This showed me the log when I stopped the test

imagen

52 minutes have passed already and it still appears as processing

imagen

The cpu of the "master" machine is at 10%, I suppose even sending the data to jtl-reporter

Captura desde 2024-05-24 09-30-48

I will continue to report on all this, I am missing a "progress" indicator in sending the data

jorgetamayo21 commented 1 month ago

Just finished sending to jtl-reporter!

imagen

jtl-reporter keeps showing me "processing"

imagen

jtl-reporter VM instance is still working

imagen

jorgetamayo21 commented 1 month ago

The jtl-reporter VM instance seems to have stopped working, there is not much CPU movement and the occupied memory dropped sharply :( Captura desde 2024-05-24 10-00-28

This shows me the log of the database container

sudo docker logs jtl-reporter-db

imagen

This shows me the log of the backend container

sudo docker logs jtlreporter_be_1

imagen

Everything indicates that he ran out of memory to process what he received (I was using version v4.10.1, now I update it and test on Monday)

But the sending process to jtl-reporter worked fine so this issue would be resolved, although it took more than an hour.

(EDIT) Finally solve the delay in sending results by using gevent.queue and multiple gevent.spawn to process the queue

ludeknovy commented 1 month ago

Hi @jorgetamayo21 Thanks for the detailed report. Would you care to share the edited python script for uploading the samples to the app using multiple gevent to make it faster?

Yes, please try again with v4.10.2 ? Hopefully the memory issue should be fixed in v4.10.2.

jorgetamayo21 commented 1 month ago

I made the pull request so you can look at it in detail.

https://github.com/ludeknovy/jtl-reporter/pull/283

ludeknovy commented 1 month ago

Thanks @jorgetamayo21

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.