Closed j-norwood-young closed 5 years ago
Those requests are weirdly big. Considering the request that's currently being generated:
{"action":"timespent","timespent":{"seconds":30,"unload":false},"system":{"property_token":"1a8feb16-3e30-4f9b-bf74-20037ea8505a","time":"2019-07-23T11:55:38.399Z"},"user":{"id":"92363","browser_id":"a1a0cb38-4c7a-49f5-b1d6-3246c5f4ae73","subscriber":true,"url":"https://dennikn.sk/","referer":"","adblock":false,"window_height":1050,"window_width":1920,"cookies":true,"websockets":true,"source":{},"remp_session_id":"72c73e09-ed47-4259-b35f-4599d400ba41","remp_pageview_id":"dfa61927-d5c7-4084-bbf2-7a367fe30cf0"}}
It's 516 bytes uncompressed and developer tools report this as 171B sent over network (probably gzip compression). Would you share your requests so we can check why they're so big?
About why it's this way.
Having things designed this way was a simplicity tradeoff - we actually didn't wanted for Tracker to contain logic or maintain information about data/pageviews being tracked. Both of that would be necessary if we wanted to use websockets or calculate time spent server-side. Tracker is supposed to be dummy validator which just checks whether the data looks OK and passes it to Kafka. Any restart or load balancing would also cause issues for that scenarios.
Because of all of mentioned, the only possible solution here is to make the interval configurable.
Btw. internally it uses logarithmic function which prolongs the interval longer your page is opened. After an hour, the update is being sent only once every 90 seconds. https://github.com/remp2020/remp/blob/master/Beam/resources/assets/js/remplib.js#L736
The implemented configuration will therefore change the initial interval and the log
function will remain there to keep the interval raising in time.
Here's an example request payload:
{"article":{"id":"368362","author_id":"Marianne Merten","tags":[],"variants":{}},"action":"load","system":{"property_token":"5478a41d-bac1-4679-8a53-4201bb4294f8","time":"2019-07-23T12:15:08.298Z"},"user":{"id":"18510","browser_id":"c8ff2969-ce3f-447c-a579-9bb5e08cd720","url":"https://www.dailymaverick.co.za/article/2019-07-23-the-never-ending-story-of-eskom-bailouts-mboweni-introduces-special-bill-of-billions-more/","referer":"https://www.dailymaverick.co.za/","user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0","adblock":false,"window_height":777,"window_width":1280,"cookies":true,"websockets":true,"source":{},"remp_session_id":"3e5db5a9-6bf6-4be2-8fa9-999be6067d8c","remp_pageview_id":"7bdf244c-012f-4ecf-ab38-63380064eef0"}}
That's 782 chars, according to wc
.
Response header is 239B, so 1021B per request and response.
Firefox is reporting around 1.3kb consistently of what it calls "Traffic", which is about 400B mysteriously being used.
Chrome reports around 340B per request.
CURL says: "upload completely sent off: 781 out of 781 bytes" (weirdly a byte short from character count - but could be a changed second digit or something.)
Of course the ISP doesn't care about payload size - it just cares about total traffic, which includes DNS lookups, frame headers, checksum etc.
The technicalities don't really matter. The issue really is that some markets are much more sensitive to bandwidth usage than others, due to income/bandwidth cost inequalities. (In SA, a full day's wages for a domestic worker will not even buy you 200MB out-of-bundle data.)
A configurable interval will help alleviate this issue. I'd be happy if I could start at 10s interval instead of 5, giving up on granularity in favour of less impact on our users. In Europe, it's much less of an issue.
Glad to hear about the logarithmic function!
My bad here, I was counting only payload and completely forgot the headers :). Anyway, I understand the point about the traffic limitations, we'll make the configuration happen.
One more reason why timespent needs to be sent by frontend (for anyone reading this in the future): The timespent timer is paused once the user switches the tab to different one and reenabled when she gets back. This behavior is only observable if frontend JS library handles that, server-side calculation wouldn't be able to include this.
Hey. It's in the master and will be in the tagged version soon. The JS snippet was changed from:
timeSpentEnabled: true // defaults to false
to:
timeSpent: {
enabled: true, // defaults to false
interval: 20 // defaults to 5
}
The BEAM client fires every 5 seconds, using about 1.3kb per request. If a page is open for an hour, that would use up about 8mb of data. If someone leaves a page open for 24 hours in South Africa, it would cost a day's wages.
Possible solutions: