Closed lizgzil closed 1 year ago
The following plots are using those 610 SICs and proportions are out of the 409,310 job adverts.
The proportion of job adverts with each SIC has a long tail and 90% of the job adverts are from 91 SICs.
The 47 SICs which account for 80% of the job adverts are shown here:
The data behind this plot is (SIC: prop job adverts):
{'70100 - Activities of head offices': 0.12711392343211747,
'78109 - Other activities of employment placement agencies': 0.07677799223082749,
'73110 - Advertising agencies': 0.05648286140089419,
'74990 - Non-trading company': 0.05536146197258801,
'82990 - Other business support service activities n.e.c.': 0.05345092961325157,
'78200 - Temporary employment agency activities': 0.03531308787960226,
'86101 - Hospital activities': 0.03395226112237668,
'86900 - Other human health activities': 0.030654027509711464,
'41100 - Development of building projects': 0.029845349490606143,
'49410 - Freight transport by road': 0.020329334734064647,
'70229 - Management consultancy activities other than financial management': 0.0191908333536928,
'77110 - Renting and leasing of cars and light motor vehicles': 0.016681732672057855,
'64205 - Activities of financial services holding companies': 0.014639270968214801,
'62012 - Business and domestic software development': 0.013598495028218222,
'85590 - Other education n.e.c.': 0.012814248369206714,
'96090 - Other service activities n.e.c.': 0.012301189807236568,
'47300 - Retail sale of automotive fuel in specialised stores': 0.01222789572695512,
'64209 - Activities of other holding companies n.e.c.': 0.010744912169260462,
'88100 - Social work activities without accommodation for the elderly and disabled': 0.010251398695365371,
'47770 - Retail sale of watches and jewellery in specialised stores': 0.010207422247196502,
'61900 - Other telecommunications activities': 0.009249712931518897,
'74909 - Other professional, scientific and technical activities n.e.c.': 0.009078693410862183,
'71129 - Other engineering activities': 0.009051818914758985,
'87300 - Residential care activities for the elderly and disabled': 0.009010285602599496,
'62090 - Other information technology service activities': 0.007146172827441304,
'47190 - Other retail sale in non-specialised stores': 0.007028902298990985,
'68320 - Management of real estate on a fee or contract basis': 0.006889643546456231,
'43999 - Other specialised construction activities n.e.c.': 0.006850553370306125,
'64191 - Banks': 0.006738169113874569,
'68310 - Real estate agencies': 0.005993012631013169,
'45111 - Sale of new cars and light motor vehicles': 0.005636314773643449,
'87900 - Other residential care activities n.e.c.': 0.005531259925240038,
'85530 - Driving school activities': 0.0048178642105005985,
'55100 - Hotels and similar accommodation': 0.004812977938481835,
'56102 - Unlicensed restaurants and cafes': 0.0048032053944443084,
'88990 - Other social work activities without accommodation n.e.c.': 0.004681048593975227,
'62020 - Information technology consultancy activities': 0.004649287825853265,
'85600 - Educational support services': 0.004480711441205932,
'64999 - Financial intermediation not elsewhere classified': 0.004346338960689942,
'52103 - Operation of warehousing and storage facilities for land transport activities': 0.004124013583836212,
'87100 - Residential nursing care facilities': 0.004018958735432802,
'56101 - Licensed restaurants': 0.003921233295057536,
'45310 - Wholesale trade of motor vehicle parts and accessories': 0.003845496078766705,
'47710 - Retail sale of clothing in specialised stores': 0.0037551000464195843,
'66220 - Activities of insurance agents and brokers': 0.003664704014072463,
'68209 - Other letting and operating of own or leased real estate': 0.0034985707654345117,
'81100 - Combined facilities support activities': 0.0032909042046370724}
Dataset outputs/data/green_industries/all_ojo_most_common_sic.csv
s3://prinz-green-jobs/outputs/data/green_industries/all_ojo_most_common_sic.csv
:consider filtering these top 35 to just include more solid SICs - e.g. remove the "others"
Find the distributions of SICs as well as we can.
This may inform a classifier approach where we train a classifier to predict the most common SICs.