shlinkio / shlink

The definitive self-hosted URL shortener
https://shlink.io
MIT License
3.2k stars 257 forks source link

Make visit:process check for existing instance before starting #270

Closed jakkuh closed 5 years ago

jakkuh commented 5 years ago

With the new GeoLite database it still seems to process IPs at a rate of about 1 per second.

It definitely seems a bit faster, but it also seems like the process is being speed limited intentionally.

With this is mind, it still seems that just making visit:process check for an existing process of itself running before starting would be the proper move to alleviate any issues.

jakkuh commented 5 years ago

(ie. you could set the cron job to every 5 minutes, and if it was already running it wouldn't cause any conflicts)

acelaya commented 5 years ago

That thing you said about the process looking as if it was being limited intentionally, made me think, and I think I know the issue.

The ORM shlink is using tends to exponentially degrade performance when working with huge datasets (and the amount of visits you are handling qualifies as huge). It also consumes more and more memory, until a potential out of memory error is thrown, so it fits what you have been experiencing lately.

I have faced this problem in the past, and it has a relatively simple solution, so I will consider this a bug and try to provide a fix as soon as possible.

Regarding the other thing you mentioned about checking if the process is already in progress, I think it should be possible to provide some locking system that prevents concurrent executions, so I will create another ticket for that too.

Thanks for all your feedback and your patience :)

acelaya commented 5 years ago

I have managed to get an important improvement with this implementation https://github.com/shlinkio/shlink/pull/275

I have done some benchmarks, and tried to process 3000 visit locations.

With current implementation it took more than 8 minutes, and the amount of memory gets increased in time. Also, the more visits to be processed, the more memory ot be consumed, since all the visits are loaded into memory and then processed.

With the new implementation it took less than 2'5 minutes. Also, since it now lazy loads every visit when it is going to be processed and then it flushes it from memory, increasing the amount of visits to process should not increase the amount of memory to be consumed, only the time in a linear way.

I will also implement the locking system, but this at least should prevent the process or the server to be killed due to high amounts of memory being consumed or memory leaks.

acelaya commented 5 years ago

And this adds locking to the command https://github.com/shlinkio/shlink/pull/276

I'll probably release the new version later today or tomorrow.

acelaya commented 5 years ago

@tivyhosting I have just released v1.14.1 https://github.com/shlinkio/shlink/releases/tag/v1.14.1

It should have a better performance, but let me know if you find anything out of place.