owid / etl

A compute graph for loading and transforming OWID's data
https://docs.owid.io/projects/etl
MIT License
78 stars 21 forks source link

AI Epoch dataset - small edits đź’„ #2795

Closed veronikasamborska1994 closed 3 months ago

veronikasamborska1994 commented 3 months ago

small changes to metadata + cleaning up domains with less than 20 systems to be "Other"

owidbot commented 3 months ago
Quick links (staging server): Site Admin Wizard

Login: ssh owid@staging-site-ai-epoch-edits

chart-diff: âś… No charts for review.
data-diff: ❌ Found differences ```diff = Dataset garden/artificial_intelligence/2024-06-03/epoch = Table epoch ~ Column organization_categorization (changed data) ~ Changed values: 172 / 837 (20.55%) days_since_1949 system organization_categorization - organization_categorization + 25189 Tacotron 2 Academia and industry Academia and industry collaboration 25241 ENAS Academia and industry Academia and industry collaboration 25382 ShuffleNet v2 Academia and industry Academia and industry collaboration 26742 Segatron-XL large, M=384 + HCP Academia and industry Academia and industry collaboration 27092 PaLM-E Academia and industry Academia and industry collaboration = Dataset garden/artificial_intelligence/2024-06-03/epoch_aggregates_affiliation = Table epoch_aggregates_affiliation ~ Dim year + + New values: 66 / 264 (25.00%) organization_categorization year Academia and industry collaboration 1987 Academia and industry collaboration 1988 Academia and industry collaboration 2004 Academia and industry collaboration 2014 Academia and industry collaboration 2022 - - Removed values: 66 / 264 (25.00%) organization_categorization year Academia and industry 1987 Academia and industry 1988 Academia and industry 2004 Academia and industry 2014 Academia and industry 2022 ~ Dim organization_categorization + + New values: 66 / 264 (25.00%) year organization_categorization 1987 Academia and industry collaboration 1988 Academia and industry collaboration 2004 Academia and industry collaboration 2014 Academia and industry collaboration 2022 Academia and industry collaboration - - Removed values: 66 / 264 (25.00%) year organization_categorization 1987 Academia and industry 1988 Academia and industry 2004 Academia and industry 2014 Academia and industry 2022 Academia and industry ~ Column cumulative_count (changed metadata, new data, changed data) - - description_short: Describes the sector where the authors of an AI system have their primary affiliations. + + description_short: |- + + Describes the sector where the authors of a notable AI system have their primary affiliations. The 2024 data is incomplete and was last updated 12 June 2024. + + New values: 66 / 264 (25.00%) year organization_categorization cumulative_count 1987 Academia and industry collaboration 0.0 1988 Academia and industry collaboration 0.0 2004 Academia and industry collaboration 5.0 2014 Academia and industry collaboration 24.0 2022 Academia and industry collaboration 142.0 - - Removed values: 66 / 264 (25.00%) year organization_categorization cumulative_count 1987 Academia and industry 0.0 1988 Academia and industry 0.0 2004 Academia and industry 5.0 2014 Academia and industry 24.0 2022 Academia and industry 142.0 ~ Column yearly_count (changed metadata, new data, changed data) - - description_short: Describes the sector where the authors of an AI system have their primary affiliations. + + description_short: |- + + Describes the sector where the authors of a notable AI system have their primary affiliations. The 2024 data is incomplete and was last updated 12 June 2024. + + New values: 66 / 264 (25.00%) year organization_categorization yearly_count 1987 Academia and industry collaboration 0.0 1988 Academia and industry collaboration 0.0 2004 Academia and industry collaboration 0.0 2014 Academia and industry collaboration 7.0 2022 Academia and industry collaboration 20.0 - - Removed values: 66 / 264 (25.00%) year organization_categorization yearly_count 1987 Academia and industry 0.0 1988 Academia and industry 0.0 2004 Academia and industry 0.0 2014 Academia and industry 7.0 2022 Academia and industry 20.0 = Dataset garden/artificial_intelligence/2024-06-03/epoch_aggregates_countries = Table epoch_aggregates_countries ~ Column cumulative_count (changed metadata) - - Total annual number of notable machine learning models attributed to the location of researchers’ affiliated institutions. The 2024 data is incomplete and was last updated 11 June 2024. + + Refers to the location of the primary organization with which the authors of a notable AI system are affiliated. An AI system can have multiple authors, each potentially affiliated with different institutions, thus contributing to the count for multiple countries. The 2024 data is incomplete and was last updated 12 June 2024. ~ Column yearly_count (changed metadata) - - Total annual number of notable machine learning models attributed to the location of researchers’ affiliated institutions. The 2024 data is incomplete and was last updated 11 June 2024. + + Refers to the location of the primary organization with which the authors of a notable AI system are affiliated. An AI system can have multiple authors, each potentially affiliated with different institutions, thus contributing to the count for multiple countries. The 2024 data is incomplete and was last updated 12 June 2024. = Dataset garden/artificial_intelligence/2024-06-03/epoch_aggregates_domain = Table epoch_aggregates_domain ~ Dim year - - Removed values: 462 / 792 (58.33%) domain year 3D modeling 1955 Driving 1965 Driving 1973 Search 1973 Driving 1991 ~ Dim domain - - Removed values: 462 / 792 (58.33%) year domain 1955 3D modeling 1965 Driving 1973 Driving 1973 Search 1991 Driving ~ Column cumulative_count (changed metadata, changed data) - - Refers to the specific area, application, or field in which an AI system is designed to operate. The 2024 data is incomplete and was last updated 11 June 2024. + + Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 12 June 2024. - - The count of notable AI systems per domain is derived by tallying the instances of machine learning models classified under each domain category. It's important to note that a single machine learning model can fall under multiple domains. The classification into domains is determined by the specific area, application, or field that the AI system is primarily designed to operate within. + + The count of notable AI systems per domain is derived by tallying the instances of machine learning models classified under each domain category. It's important to note that a single machine learning model can fall under multiple domains. The classification into domains is determined by the specific area, application, or field that the AI system is primarily designed to operate within. System domains with less than 10 systems are grouped under "Other." ? ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - - Removed values: 462 / 792 (58.33%) year domain cumulative_count 1955 3D modeling 0 1965 Driving 0 1973 Driving 0 1973 Search 0 1991 Driving 2 ~ Changed values: 57 / 792 (7.20%) year domain cumulative_count - cumulative_count + 1977 Other 10 12.0 1995 Other 20 30.0 2002 Other 22 32.0 2003 Other 22 32.0 2009 Other 30 43.0 ~ Column yearly_count (changed metadata, changed data) - - Refers to the specific area, application, or field in which an AI system is designed to operate. The 2024 data is incomplete and was last updated 11 June 2024. + + Describes the specific area, application, or field in which an AI system is designed to operate. An AI system can operate in more than one domain, thus contributing to the count for multiple domains. The 2024 data is incomplete and was last updated 12 June 2024. - - The count of notable AI systems per domain is derived by tallying the instances of machine learning models classified under each domain category. It's important to note that a single machine learning model can fall under multiple domains. The classification into domains is determined by the specific area, application, or field that the AI system is primarily designed to operate within. + + The count of notable AI systems per domain is derived by tallying the instances of machine learning models classified under each domain category. It's important to note that a single machine learning model can fall under multiple domains. The classification into domains is determined by the specific area, application, or field that the AI system is primarily designed to operate within. System domains with less than 10 systems are grouped under "Other." ? ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ - - Removed values: 462 / 792 (58.33%) year domain yearly_count 1955 3D modeling 0 1965 Driving 0 1973 Driving 0 1973 Search 0 1991 Driving 0 ~ Changed values: 21 / 792 (2.65%) year domain yearly_count - yearly_count + 1969 Other 0 1.0 2006 Other 2 3.0 2012 Other 1 2.0 2016 Other 1 2.0 2024 Other 0 1.0 = Dataset garden/artificial_intelligence/2024-06-03/epoch_aggregates_organizations = Table epoch_aggregates_organizations ~ Column cumulative_count (changed metadata) - - Total annual number of notable machine learning models developed by various organizations. The 2024 data is incomplete and was last updated 11 June 2024. + + Refers to the primary organization affiliated with the authors of a notable AI system. The 2024 data is incomplete and was last updated 12 June 2024. ~ Column yearly_count (changed metadata) - - Total annual number of notable machine learning models developed by various organizations. The 2024 data is incomplete and was last updated 11 June 2024. + + Refers to the primary organization affiliated with the authors of a notable AI system. The 2024 data is incomplete and was last updated 12 June 2024. Legend: +New ~Modified -Removed =Identical Details Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet ``` Automatically updated datasets matching _weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk_ are not included

Edited: 2024-06-12 12:17:59 UTC Execution time: 13.64 seconds