matomo-org / device-detector

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
http://devicedetector.net
GNU Lesser General Public License v3.0
2.86k stars 469 forks source link

List of all possible returned values? #7044

Closed sv3t0sl4v closed 2 years ago

sv3t0sl4v commented 2 years ago

Say I would like to record the detections in a database for analytical purposes. I would like to see all available returned values and their lengths, so I can plan the database structure. Is this listed somewhere?

liviuconcioiu commented 2 years ago

Is not listed anywhere the length of each return value. I suggest for user agents, you have TEXT and have another column with the hashed string, so you can make and index on it, to avoid duplicates. For return values use VARCHAR(255).

sv3t0sl4v commented 2 years ago

I do not believe this to be wise. What I would rather do is have separate columns for:

<?php
$lang = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2);
if ($lang) {
    if ($lang === 'ab') {
        $language = 'Abkhazian';
    } elseif ($lang === 'aa') {
        $language = 'Afar';
    } elseif ($lang === 'af') {
        $language = 'Afrikaans';
    } elseif ($lang === 'ak') {
        $language = 'Akan';
    } elseif ($lang === 'sq') {
        $language = 'Albanian';
    } elseif ($lang === 'am') {
        $language = 'Amharic';
    } elseif ($lang === 'ar') {
        $language = 'Arabic';
    } elseif ($lang === 'an') {
        $language = 'Aragonese';
    } elseif ($lang === 'hy') {
        $language = 'Armenian';
    } elseif ($lang === 'as') {
        $language = 'Assamese';
    } elseif ($lang === 'av') {
        $language = 'Avaric';
    } elseif ($lang === 'ae') {
        $language = 'Avestan';
    } elseif ($lang === 'ay') {
        $language = 'Aymara';
    } elseif ($lang === 'az') {
        $language = 'Azerbaijani';
    } elseif ($lang === 'bm') {
        $language = 'Bambara';
    } elseif ($lang === 'ba') {
        $language = 'Bashkir';
    } elseif ($lang === 'eu') {
        $language = 'Basque';
    } elseif ($lang === 'be') {
        $language = 'Belarusian';
    } elseif ($lang === 'bn') {
        $language = 'Bengali';
    } elseif ($lang === 'bh') {
        $language = 'Bihari';
    } elseif ($lang === 'bi') {
        $language = 'Bislama';
    } elseif ($lang === 'bs') {
        $language = 'Bosnian';
    } elseif ($lang === 'br') {
        $language = 'Breton';
    } elseif ($lang === 'bg') {
        $language = 'Bulgarian';
    } elseif ($lang === 'my') {
        $language = 'Burmese';
    } elseif ($lang === 'ca') {
        $language = 'Catalan';
    } elseif ($lang === 'ch') {
        $language = 'Chamorro';
    } elseif ($lang === 'ce') {
        $language = 'Chechen';
    } elseif ($lang === 'ny') {
        $language = 'Nyanja';
    } elseif ($lang === 'zh') {
        $language = 'Chinese';
    } elseif ($lang === 'cv') {
        $language = 'Chuvash';
    } elseif ($lang === 'kw') {
        $language = 'Cornish';
    } elseif ($lang === 'co') {
        $language = 'Corsican';
    } elseif ($lang === 'cr') {
        $language = 'Cree';
    } elseif ($lang === 'hr') {
        $language = 'Croatian';
    } elseif ($lang === 'cs') {
        $language = 'Czech';
    } elseif ($lang === 'da') {
        $language = 'Danish';
    } elseif ($lang === 'dv') {
        $language = 'Dhivehi';
    } elseif ($lang === 'nl') {
        $language = 'Dutch';
    } elseif ($lang === 'dz') {
        $language = 'Dzongkha';
    } elseif ($lang === 'en') {
        $language = 'English';
    } elseif ($lang === 'eo') {
        $language = 'Esperanto';
    } elseif ($lang === 'et') {
        $language = 'Estonian';
    } elseif ($lang === 'ee') {
        $language = 'Ewe';
    } elseif ($lang === 'fo') {
        $language = 'Faroese';
    } elseif ($lang === 'fj') {
        $language = 'Fijian';
    } elseif ($lang === 'fi') {
        $language = 'Finnish';
    } elseif ($lang === 'fr') {
        $language = 'French';
    } elseif ($lang === 'ff') {
        $language = 'Fulah';
    } elseif ($lang === 'gl') {
        $language = 'Galician';
    } elseif ($lang === 'gd') {
        $language = 'Scots';
    } elseif ($lang === 'gv') {
        $language = 'Manx';
    } elseif ($lang === 'ka') {
        $language = 'Georgian';
    } elseif ($lang === 'de') {
        $language = 'German';
    } elseif ($lang === 'el') {
        $language = 'Greek';
    } elseif ($lang === 'kl') {
        $language = 'Greenlandic';
    } elseif ($lang === 'gn') {
        $language = 'Guarani';
    } elseif ($lang === 'gu') {
        $language = 'Gujarati';
    } elseif ($lang === 'ht') {
        $language = 'Haitian Creole';
    } elseif ($lang === 'ha') {
        $language = 'Hausa';
    } elseif ($lang === 'he') {
        $language = 'Hebrew';
    } elseif ($lang === 'hz') {
        $language = 'Herero';
    } elseif ($lang === 'hi') {
        $language = 'Hindi';
    } elseif ($lang === 'ho') {
        $language = 'Hiri Motu';
    } elseif ($lang === 'hu') {
        $language = 'Hungarian';
    } elseif ($lang === 'is') {
        $language = 'Icelandic';
    } elseif ($lang === 'io') {
        $language = 'Ido';
    } elseif ($lang === 'ig') {
        $language = 'Igbo';
    } elseif ($lang === 'id' || $lang ==='in') {
        $language = 'Indonesian';
    } elseif ($lang === 'ia') {
        $language = 'Interlingua';
    } elseif ($lang === 'ie') {
        $language = 'Interlingue';
    } elseif ($lang === 'iu') {
        $language = 'Inuktitut';
    } elseif ($lang === 'ik') {
        $language = 'Inupiak';
    } elseif ($lang === 'ga') {
        $language = 'Irish';
    } elseif ($lang === 'it') {
        $language = 'Italian';
    } elseif ($lang === 'ja') {
        $language = 'Japanese';
    } elseif ($lang === 'jv') {
        $language = 'Javanese';
    } elseif ($lang === 'kl') {
        $language = 'Kalaallisut';
    } elseif ($lang === 'kn') {
        $language = 'Kannada';
    } elseif ($lang === 'kr') {
        $language = 'Kanuri';
    } elseif ($lang === 'ks') {
        $language = 'Kashmiri';
    } elseif ($lang === 'kk') {
        $language = 'Kazakh';
    } elseif ($lang === 'km') {
        $language = 'Khmer';
    } elseif ($lang === 'ki') {
        $language = 'Kikuyu';
    } elseif ($lang === 'rw') {
        $language = 'Kinyarwanda';
    } elseif ($lang === 'rn') {
        $language = 'Kirundi';
    } elseif ($lang === 'ky') {
        $language = 'Kyrgyz';
    } elseif ($lang === 'kv') {
        $language = 'Komi';
    } elseif ($lang === 'kg') {
        $language = 'Kongo';
    } elseif ($lang === 'ko') {
        $language = 'Korean';
    } elseif ($lang === 'ku') {
        $language = 'Kurdish';
    } elseif ($lang === 'kj') {
        $language = 'Kwanyama';
    } elseif ($lang === 'lo') {
        $language = 'Lao';
    } elseif ($lang === 'la') {
        $language = 'Latin';
    } elseif ($lang === 'lv') {
        $language = 'Latvian';
    } elseif ($lang === 'li') {
        $language = 'Limburgish';
    } elseif ($lang === 'ln') {
        $language = 'Lingala';
    } elseif ($lang === 'lt') {
        $language = 'Lithuanian';
    } elseif ($lang === 'lu') {
        $language = 'Luga-Katanga';
    } elseif ($lang === 'lg') {
        $language = 'Luganda';
    } elseif ($lang === 'lb') {
        $language = 'Luxembourgish';
    } elseif ($lang === 'gv') {
        $language = 'Manx';
    } elseif ($lang === 'mk') {
        $language = 'Macedonian';
    } elseif ($lang === 'mg') {
        $language = 'Malagasy';
    } elseif ($lang === 'ms') {
        $language = 'Malay';
    } elseif ($lang === 'ml') {
        $language = 'Malayalam';
    } elseif ($lang === 'mt') {
        $language = 'Maltese';
    } elseif ($lang === 'mi') {
        $language = 'Maori';
    } elseif ($lang === 'mr') {
        $language = 'Marathi';
    } elseif ($lang === 'mh') {
        $language = 'Marshallese';
    } elseif ($lang === 'mo') {
        $language = 'Moldavian';
    } elseif ($lang === 'mn') {
        $language = 'Mongolian';
    } elseif ($lang === 'na') {
        $language = 'Nauru';
    } elseif ($lang === 'nv') {
        $language = 'Navajo';
    } elseif ($lang === 'ng') {
        $language = 'Ndonga';
    } elseif ($lang === 'nd') {
        $language = 'Northern Ndebele';
    } elseif ($lang === 'ne') {
        $language = 'Nepali';
    } elseif ($lang === 'no') {
        $language = 'Norwegian';
    } elseif ($lang === 'nb') {
        $language = 'Norwegian Bokmål';
    } elseif ($lang === 'nn') {
        $language = 'Norwegian Nynorsk';
    } elseif ($lang === 'ii') {
        $language = 'Nuosu';
    } elseif ($lang === 'oc') {
        $language = 'Occitan';
    } elseif ($lang === 'oj') {
        $language = 'Ojibwe';
    } elseif ($lang === 'cu') {
        $language = 'Old Bulgarian';
    } elseif ($lang === 'or') {
        $language = 'Oriya';
    } elseif ($lang === 'om') {
        $language = 'Oromo';
    } elseif ($lang === 'os') {
        $language = 'Ossetian';
    } elseif ($lang === 'pi') {
        $language = 'Pāli';
    } elseif ($lang === 'ps') {
        $language = 'Pashto';
    } elseif ($lang === 'fa') {
        $language = 'Farsi';
    } elseif ($lang === 'pl') {
        $language = 'Polish';
    } elseif ($lang === 'pt') {
        $language = 'Portuguese';
    } elseif ($lang === 'pa') {
        $language = 'Punjabi';
    } elseif ($lang === 'qu') {
        $language = 'Quechua';
    } elseif ($lang === 'rm') {
        $language = 'Romansh';
    } elseif ($lang === 'ro') {
        $language = 'Romanian';
    } elseif ($lang === 'ru') {
        $language = 'Russian';
    } elseif ($lang === 'se') {
        $language = 'Sami';
    } elseif ($lang === 'sm') {
        $language = 'Samoan';
    } elseif ($lang === 'sg') {
        $language = 'Sango';
    } elseif ($lang === 'sa') {
        $language = 'Sanskrit';
    } elseif ($lang === 'sr') {
        $language = 'Serbian';
    } elseif ($lang === 'sh') {
        $language = 'Serbo-Croatian';
    } elseif ($lang === 'st') {
        $language = 'Sesotho';
    } elseif ($lang === 'tn') {
        $language = 'Setswana';
    } elseif ($lang === 'sn') {
        $language = 'Shona';
    } elseif ($lang === 'ii') {
        $language = 'Sichuan Yi';
    } elseif ($lang === 'sd') {
        $language = 'Sindhi';
    } elseif ($lang === 'si') {
        $language = 'Sinhalese';
    } elseif ($lang === 'ss') {
        $language = 'Siswati';
    } elseif ($lang === 'sk') {
        $language = 'Slovak';
    } elseif ($lang === 'sl') {
        $language = 'Slovenian';
    } elseif ($lang === 'so') {
        $language = 'Somali';
    } elseif ($lang === 'nr') {
        $language = 'Southern Ndebele';
    } elseif ($lang === 'es') {
        $language = 'Spanish';
    } elseif ($lang === 'su') {
        $language = 'Sundanese';
    } elseif ($lang === 'sw') {
        $language = 'Swahili';
    } elseif ($lang === 'ss') {
        $language = 'Swati';
    } elseif ($lang === 'sv') {
        $language = 'Swedish';
    } elseif ($lang === 'tl') {
        $language = 'Tagalog';
    } elseif ($lang === 'ty') {
        $language = 'Tahitian';
    } elseif ($lang === 'tg') {
        $language = 'Tajik';
    } elseif ($lang === 'ta') {
        $language = 'Tamil';
    } elseif ($lang === 'tt') {
        $language = 'Tatar';
    } elseif ($lang === 'te') {
        $language = 'Telugu';
    } elseif ($lang === 'th') {
        $language = 'Thai';
    } elseif ($lang === 'bo') {
        $language = 'Tibetan';
    } elseif ($lang === 'ti') {
        $language = 'Tigrinya';
    } elseif ($lang === 'to') {
        $language = 'Tonga';
    } elseif ($lang === 'ts') {
        $language = 'Tsonga';
    } elseif ($lang === 'tr') {
        $language = 'Turkish';
    } elseif ($lang === 'tk') {
        $language = 'Turkmen';
    } elseif ($lang === 'tw') {
        $language = 'Twi';
    } elseif ($lang === 'ug') {
        $language = 'Uyghur';
    } elseif ($lang === 'uk') {
        $language = 'Ukrainian';
    } elseif ($lang === 'ur') {
        $language = 'Urdu';
    } elseif ($lang === 'uz') {
        $language = 'Uzbek';
    } elseif ($lang === 've') {
        $language = 'Venda';
    } elseif ($lang === 'vi') {
        $language = 'Vietnamese';
    } elseif ($lang === 'vo') {
        $language = 'Volapük';
    } elseif ($lang === 'wa') {
        $language = 'Wallon';
    } elseif ($lang === 'cy') {
        $language = 'Welsh';
    } elseif ($lang === 'wo') {
        $language = 'Wolof';
    } elseif ($lang === 'fy') {
        $language = 'Western Frisian';
    } elseif ($lang === 'xh') {
        $language = 'Xhosa';
    } elseif ($lang === 'yi' || $lang ='ji') {
        $language = 'Yiddish';
    } elseif ($lang === 'yo') {
        $language = 'Yoruba';
    } elseif ($lang === 'za') {
        $language = 'Zhuang, Chuang';
    } elseif ($lang === 'zu') {
        $language = 'Zulu';
    } else {
        $language = NULL;
    }
}
$langDetect = $language ? $language : NULL;
?>

On top of that, indexing for MySQL would not matter so much. It may even slow down writes. I would rather stream the data to ClickHouse for faster reads, then delete already streamed partitions. MySQL is fast in writes, but ClickHouse is fast in reads and there is where I emphasise on indexes more. The main problem here is OS, e.g. Ubuntu is detected as Ubuntu in Firefox and as GNU/Linux in Chrome. I imagine it is similar with other Linux distros. Please advise on how to approach this, if you are more familiar with returned values.

liviuconcioiu commented 2 years ago

For brands, see $deviceBrands array https://github.com/matomo-org/device-detector/blob/master/Parser/Device/AbstractDeviceParser.php#L83 For os, see $operatingSystems array https://github.com/matomo-org/device-detector/blob/master/Parser/OperatingSystem.php#L42 For browsers, see $availableBrowsers array https://github.com/matomo-org/device-detector/blob/master/Parser/Client/Browser.php#L40

For models, the size varies from empty to any possible number - VARCHAR(255) is OK for this.

sv3t0sl4v commented 2 years ago

This will help. Thanks!

sv3t0sl4v commented 2 years ago

From this list it appears that browsers values are 26 chars tops, and OS values are 20 chars tops. Now the dilemma:

Ah! Looks like codes it would be.

sanchezzzhak commented 2 years ago

my experience create tables with one entity for each property os.name, client.name, device.brand. device.model

CREATE TABLE public.os (
    id uuid NOT NULL,
    "name" varchar(255) NOT NULL,
    CONSTRAINT os_pkey PRIMARY KEY (id)
);
CREATE UNIQUE INDEX "idx_os-name" ON public.os USING btree (name);

CREATE TABLE public.device_model (
    id uuid NOT NULL,
    "name" varchar(255) NULL DEFAULT NULL::character varying,
    CONSTRAINT device_model_pkey PRIMARY KEY (id)
);
CREATE UNIQUE INDEX "idx_device_model-name" ON public.os USING btree (name);

---ETC..

ClickHouse

CREATE TABLE stat
(
    `event_date` Date,
    `time` UInt32,
    `user_id` UUID,
    `event_type` UInt8,  -- 1 click, 2 skip, 3 - impression
    `is_unique` UInt8,
    `player_id` UUID,
    `advertising_id` Nullable(UUID),
    `ad_type` UInt8,
    `traffic_id` Nullable(UUID),
    `ua_id` Nullable(UUID),
    `os_id` Nullable(UUID),
    `os_version` FixedString(12),
    `browser_id` Nullable(UUID),
    `browser_version` FixedString(12),
    `device_type` UInt8,
    `model_id` Nullable(UUID),
    `brand_id` Nullable(UUID),
    `country_id` Nullable(UUID),
    `ipv4` Nullable(IPv4),
    `ipv6` Nullable(IPv6),
    `domain_id` Nullable(UUID),
    `income` Decimal(9,6)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date,
 user_id,
 player_id)
SETTINGS index_granularity = 8192

I will also say that the fields do not take up all 255 in disk memory in mysql, postgresql, ch

if the value is say 10 characters

sv3t0sl4v commented 2 years ago

I think it would be better to have them on file with if, elseif, else statements comparing code to get full string with "===". This should be fastest. ClickHouse is not very good with JOIN queries and not sure, if having them stored in MySQL tables is a good idea, because the reads would be by ClickHouse. Question remains really whether when inserting into MySQL clicks table, it would be faster to get name by short_name from a database or directly from file depending on compression of database, i guess. Hmm...

sv3t0sl4v commented 2 years ago

For brands, see $deviceBrands array https://github.com/matomo-org/device-detector/blob/master/Parser/Device/AbstractDeviceParser.php#L83 For os, see $operatingSystems array https://github.com/matomo-org/device-detector/blob/master/Parser/OperatingSystem.php#L42 For browsers, see $availableBrowsers array https://github.com/matomo-org/device-detector/blob/master/Parser/Client/Browser.php#L40

For models, the size varies from empty to any possible number - VARCHAR(255) is OK for this.

Does device parameter have such short codes? Like 1 for desktop, 2 for smartphone, 3 for tablet, 4 for feature phone, 5 for TV, etc.

liviuconcioiu commented 2 years ago

https://github.com/matomo-org/device-detector/blob/master/Parser/Device/AbstractDeviceParser.php#L39-L52

sv3t0sl4v commented 2 years ago

https://github.com/matomo-org/device-detector/blob/master/Parser/Device/AbstractDeviceParser.php#L39-L52

Is there some function which returns this short value instead? Checking the code now and I can't find it, yet.

liviuconcioiu commented 2 years ago

$dd->getDevice()

sv3t0sl4v commented 2 years ago

Thanks! This is great! Exactly what I need!

sv3t0sl4v commented 2 years ago

$dd->getDevice()

Is something similar available for bots? Like separate bot types by ID. I can't find such a thing in the bots detection scripts, yet.

liviuconcioiu commented 2 years ago

No, only names found here https://github.com/matomo-org/device-detector/blob/master/regexes/bots.yml

sv3t0sl4v commented 2 years ago

OK. Thanks!