Closed sherlockbonez closed 6 months ago
Did you mean to include the //
double slashes? I'm not sure Apple has ever done that before. Not a single download to an OP3-measured show in the last year with those.
Maybe someone trying to spoof?
Do you have all http headers from one of the requests?
Also take a look at the ip addresses and see if they are from cloud IPs
Where exactly does the data come from? So far I have only seen the //
double slashes from the access logs from AIS (AdsWizz).
Yes, these are from AIS session and access logs. This is what we see testing out user agent strings which have a double //
compared to those that only have a single /
The pattern in the devices.json should caprute the iPhone for the below user agent strings being tested and properly categorize them as "Apple iPhone" but this is only the case for the single /
entry.
"name": "Apple iPhone",
"pattern": "iphone|iOS|iPhone|CFNetwork| ios |phone;ios",
"category": "mobile",
Here we try and run a test against the double //
--checkUserAgent 'AppleCoreMedia//1.0.0.20G81 (iPhone; U; CPU OS 16_6_1 like Mac OS X; en_us)'
Loaded UserAgent patterns from /etc/user-agents/bots.json
Loaded UserAgent patterns from /etc/user-agents/apps.json
Loaded UserAgent patterns from /etc/user-agents/libraries.json
Loaded UserAgent patterns from /etc/user-agents/browsers.json
Loaded UserAgent patterns from /etc/user-agents/devices.json
Loaded UserAgent patterns from /etc/user-agents/referrers.json
User Agent was not found in database
No match
Here we try the test for the single /
--checkUserAgent 'AppleCoreMedia/1.0.0.20G81 (iPhone; U; CPU OS 16_6_1 like Mac OS X; en_us)'
Loaded UserAgent patterns from /etc/user-agents/bots.json
Loaded UserAgent patterns from /etc/user-agents/apps.json
Loaded UserAgent patterns from /etc/user-agents/libraries.json
Loaded UserAgent patterns from /etc/user-agents/browsers.json
Loaded UserAgent patterns from /etc/user-agents/devices.json
Loaded UserAgent patterns from /etc/user-agents/referrers.json
{
"name" : "AppleCoreMedia",
"type" : "library",
"device_name" : "Apple iPhone",
"device_category" : "mobile",
"referrer_name" : null,
"referrer_category" : null,
"is_bot" : false
}
Given the above test, the user agent with the single /
is matching the library record. The devices patterns are only enhancements per the directions: https://github.com/opawg/user-agents-v2/tree/3f3a7e75270c5f7807de64e80013d3e0a1cf14bc#quick-start
The file only gets used if it matches one of: bots, apps, libraries, or browsers
The pattern actually being matched is from the libraries.json here: https://github.com/opawg/user-agents-v2/blob/3f3a7e75270c5f7807de64e80013d3e0a1cf14bc/src/libraries.json#L23
"pattern": "^AppleCoreMedia/1",
So by default only matches the single slash version of the AppleCoreMedia
user agent strings. Our code and parsing logic works, its just the pattern that's missing. Patterns need to account for double forward slashes. Reviewing our AdsWizz Access and Session logs would appear all user agent strings contain //
where normally a single /
would be found. Here are some examples:
"AppleCoreMedia//1.0.0.20H115 (iPhone; U; CPU OS 16_7_2 like Mac OS X; es_xl)"
"Mozilla//5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit//537.36 (KHTML, like Gecko) Chrome//81.0.4044.113 Safari//537.36"
"Roku//DVP-12.5 (12.5.0.4178-91)"
"Dalvik//2.1.0 (Linux; U; Android 13; SM-G770F Build//TP1A.220624.014)"
"Echo//1.0(APNG)"
"AppleCoreMedia//1.0.0.21K69 (Apple TV; U; CPU OS 17_1 like Mac OS X; en_us)"
Could we add an optional additional second slash to pattern matches?
^AppleCoreMedia//?1
as an example for AppleCoreMedia entries.
The actual requests, i.e. the clients, send the user agent without //. AIS is the problem here, as this user agent is stored in the logs in a modified form.
If we do not have a hit, we simply replace these duplicate // in / within the log data and check again.
Yes, I think keeping this project focused on the http user-agent header value is what we want to do here. If your system is adding/escaping slashes after the fact, you can get back to the actual value using a method similar to what @knoxmic suggests
We are seeing an uptick in AppleCoreMedia user agents for iPhone, iPad, and Apple TV. These aren't included in the OPAWG2 list and therefore missing device and application categorization.
iPhone AppleCoreMedia//1.0.0.21B91 (iPhone; U; CPU OS 17_1_1 like Mac OS X; en_us) AppleCoreMedia//1.0.0.20G81 (iPhone; U; CPU OS 16_6_1 like Mac OS X; en_us) AppleCoreMedia//1.0.0.19H370 (iPhone; U; CPU OS 15_8 like Mac OS X; en_us) AppleCoreMedia//1.0.0.20H115 (iPhone; U; CPU OS 16_7_2 like Mac OS X; en_us) AppleCoreMedia//1.0.0.20B101 (iPhone; U; CPU OS 16_1_1 like Mac OS X; en_us) AppleCoreMedia//1.0.0.21B91 (iPhone; U; CPU OS 17_1_1 like Mac OS X; en_ca) AppleCoreMedia//1.0.0.20D67 (iPhone; U; CPU OS 16_3_1 like Mac OS X; en_us) AppleCoreMedia//1.0.0.20G75 (iPhone; U; CPU OS 16_6 like Mac OS X; en_us)
iPad AppleCoreMedia//1.0.0.21B91 (iPad; U; CPU OS 17_1_1 like Mac OS X; en_us)
Apple TV AppleCoreMedia//1.0.0.21K69 (Apple TV; U; CPU OS 17_1 like Mac OS X; en_us)
For the iPhone and iPad entries, are there any indication if they are access via app or through safari browser?