opencaching / opencaching-pl

The source code of Opencaching.PL (and some other domains)
https://opencaching.pl/
GNU General Public License v3.0
22 stars 33 forks source link

Limits on cache_visits2 table #2305

Closed andrixnet closed 2 years ago

andrixnet commented 2 years ago

I just discovered that the cache_visits2 table seems to grow indefinitely, like a log without rotation.

Some sort of limit/rotation should be placed on it's growth.

If this is used to count how many times the cache page was visited, some strategy should be put in place such that it's growth can be limited.

rapotek commented 2 years ago

This is strange. Number of rows in cache_visits2 should not exceed number_of_caches * (number_of_users + 2) in worst case, but in real life it should be even a lesser number. Some too old rows are deleted occasionally (randomly).

andrixnet commented 2 years ago

In OCRO case the number is slightly lower then number_of_caches * (number_of_users + 2), considering only users with is_active_flag=1

table has 425386 entries, formular yields 449500.

Questions:

I don't understand the real purpose. This looks like an odd kind of logging each cache from which IP was visited. There is no reference to user accounts.

There are "C" types without IP address. There are "U" types with erroneous IP address U types have all 0 count. What's the relevance of visit_date for "C" types?

Why log each IP and date? Why not simply update some counters? I see some caches have many more records in this table then there are users in the system (not just +2),

P.S. I noticed this doing a database backup using CSV export. Asides from the NUTS data (which is large), this is the table that took longest to export, thus indeed, is LARGE.

On 2021-10-18 3:43 PM, rapotek wrote:

This is strange. Number of rows in |cache_visits2| should not exceed number_of_caches * (number_of_users + 2) in worst case, but in real life it should be even a lesser number. Some too old rows are deleted occasionally (randomly).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opencaching/opencaching-pl/issues/2305#issuecomment-945725311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZ36IIQQ5JQAWED2E43MLUHQI73ANCNFSM5GGOZ3UA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

rapotek commented 2 years ago

OK, I looked more closely.

So there is one thing I didn't notice before that is registering visits for not logged-in (anonymous) users when coordinates are shown.

andrixnet commented 2 years ago

I see. I can understand the counters. Still what I don't understand is the point of logging each IP. Especially of non logged-in users. Then the rowcount gets an upper limit on the order of no_of_caches * 2^32 ...

Btw, by this logic, do the following affect these counters? They all mean providing the entire cache listing.

On 2021-10-21 1:12 PM, rapotek wrote:

OK, I looked more closely.

  • In fact there can be more rows than the upper limit defined above because in "usual" case the cache is viewed by a logged in user and the user's id is stored then, but if there is no current user id, that means a cache is viewed by someone who has not logged in, the client IP is stored then. Run a select counting "valid" and "invalid" IPs and then you can compare numbers of logged-in visits against anonymous visits.
  • 'C' means total visits for a particular cache while 'U' means last visit made by a particular user (or IP). The 'C' and 'U' are updated when the last visit of a cache for a current user was before given period counting from now.

So there is one thing I didn't notice before that is registering visits for not logged in (anonymous) users.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opencaching/opencaching-pl/issues/2305#issuecomment-948461902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZ36JZPMCUXEA6RXBTFDTUH7RSFANCNFSM5GGOZ3UA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

rapotek commented 2 years ago

Regarding your question: in general counter change is invoked by ViewCacheController, so when you read cache data directly from DB or use another approach not involving mentioned controller, the counters shouldn't be changed.

The config settings suggests that opencaching.ro has coordinates visible for non logged-in visitors, so "IP visits" would be counted too. You could change settings to hide coordinates and see the effect, but I think you have them visible for other purposes. On each attempt of increasing the counter, regardless of last visit period of current cache, there is a 1/1000 chance that "old" rows will be deleted. You could check how many "old" rows there are in database (search by "U" type and visit_date < NOW() - 604800).

Anyway, this functionality could be improved by f.ex. introduce a separate config settings for counting anonymous visits, a config setting for unique visit period (now it is hardcoded) and maybe a cron job to systematically remove "old" rows instead of a randomly invoked action as it is now.

andrixnet commented 2 years ago

There are records going back to 2017. So mentioned improvements would be welcomed.

On 2021-10-21 7:52 PM, rapotek wrote:

Regarding your question: in general counter change is invoked by |ViewCacheController|, so when you read cache data directly from DB or use another approach not involving mentioned controller, the counters shouldn't be changed.

The config settings suggests that opencaching.ro has coordinates visible for non logged-in visitors, so "IP visits" would be counted too. You could change settings to hide coordinates and see the effect, but I think you have them visible for other purposes. On each attempt of increasing the counter, regardless of last visit period of current cache, there is a 1/1000 chance that "old" rows will be deleted. You could check how many "old" rows there are in database (search by "U" |type| and |visit_date < NOW() - 604800|).

Anyway, this functionality could be improved by f.ex. introduce a separate config settings for counting anonymous visits, a config setting for unique visit period (now it is hardcoded) and maybe a cron job to systematically remove "old" rows instead of a randomly invoked action as it is now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/opencaching/opencaching-pl/issues/2305#issuecomment-948799015, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZZ36M7T2OGD4OUW47VWVTUIBAOHANCNFSM5GGOZ3UA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.