This repository contains the code required to measure the greeness of job adverts at the skill-, occupation- and sector-level. At its highest level, this codebase contains the algorithms required to:
At the job advert level, this can be summaried with the following visual:
To extract green measures at the job advert level, you can use the GreenMeasures
class to extract measures at the skill-, occupation- and sector-level. The following code snippet shows how to extract measures from a single job advert:
from dap_prinz_green_jobs.pipeline.green_measures.green_measures import GreenMeasures
job_ad = {'id': 1,
'job_title': 'Senior Sustainability Consultant',
'job_text': 'You will work as part of a peer group of specialists and project managers, supported by a strong and diverse team of consultants and senior leaders. We are a organisation that is part of the architecture sector and is focused on the build environment. The role requires strong skills in sustainability reporting and knowledge of climate change. It also requires a sound understanding of qualitative/quantitative analysis and excellent report writing and communication skills.'}
gm = GreenMeasures() #instantiate class
measures = gm.get_green_measures(job_ad) #Extract measures at all levels of granularity
>> {'SKILL MEASURES': {1: {'NUM_ORIG_ENTS': 6,
'NUM_SPLIT_ENTS': 7,
'ENTS': [(['sustainability reporting'], 'SKILL'),
(['knowledge of climate change'], 'SKILL'),
(['understanding of qualitative quantitative analysis'], 'SKILL'),
(['report writing'], 'SKILL'),
(['communication skills'], 'SKILL'),
(['work as part of a peer group of specialists and',
'peer group of specialists and project managers'],
'MULTISKILL')],
'GREEN_ENTS': [('sustainability reporting',
('green',
1.0,
('sustainability',
'b1b118c4-3291-484e-b64d-6d51fd5da8b3',
0.7539591423676308))),
('knowledge of climate change',
('green',
0.976,
('nature of climate change impact',
'1565b401-1754-4b07-8f1a-eb5869e64d95',
0.7173026456439915)))],
'PROP_GREEN': 0.2857142857142857,
'BENEFITS': None}},
'INDUSTRY MEASURES': {1: {'SIC': '711',
'SIC_name': 'Architectural and engineering activities and related technical consultancy',
'SIC_confidence': 0.73,
'SIC_method': 'closest distance',
'company_description': 'We are a organisation that is part of the architecture sector and is focused on the build environment.',
'INDUSTRY TOTAL GHG EMISSIONS': 297.9,
'INDUSTRY GHG PER UNIT EMISSIONS': 0.02,
'INDUSTRY PROP HOURS GREEN TASKS': 11.4,
'INDUSTRY PROP WORKERS GREEN TASKS': 50.2,
'INDUSTRY PROP WORKERS 20PERC GREEN TASKS': 26.6,
'INDUSTRY GHG EMISSIONS PER EMPLOYEE': 0.7,
'INDUSTRY CARBON DIOXIDE EMISSIONS PER EMPLOYEE': 1709.4}},
'OCCUPATION MEASURES': {1: {'GREEN CATEGORY': 'Green New & Emerging',
'GREEN/NOT GREEN': 'Green',
'GREEN TIMESHARE': 62.5,
'GREEN TOPICS': 55,
'SOC': {'SOC_2020_EXT': '2152/05',
'SOC_2020': '2152',
'SOC_2010': '2142',
'name': ['Environment professionals',
'Environmental and geo-environmental engineers',
'Sustainability officers',
'Environmental scientists',
'Energy managers']}}}}
You can also pass a list of job adverts to the get_green_measures
method to extract measures from multiple job adverts at once.
Should you like to extract a single measure (i.e. extract green skills) or if you would just like to extract SOC, SIC or skills, please refer to detailed READMEs in the dap_prinz_green_jobs/pipeline/green_measures/ directory.
Core to the codebase are the following directories:
dap_prinz_green_jobs/pipeline/green_measures/
: This directory contains the code required and methodological summaries to extract, map and join job adverts to:
occupations/
: Standard Occupational Classification codes (SOC) and associated greeness datasets;industries/
: Standard Industrial Classification codes (SIC) and associated greeness datasets and;skills/
: The European Skills, Competences, Qualifications and Occupations (ESCO) Green Skills taxonomy.dap_prinz_green_jobs/pipeline/ojo_application
: This directory contains the code required to apply the algorithms on different samples of scraped online job adverts from the Open Jobs Observatory (OJO). Code in this directory requires access to Nesta's private S3 bucket and is not available to the public.
dap_prinz_green_jobs/analysis/
: This directory contains the analysis code that powers the Green Jobs Explorer (link to follow), our demo tool to explore and learn more about green jobs and skills.
If you would to explore the data via a front end, we've build a demo tool for researchers (link to follow) to explore and download data on green jobs and skills.
direnv
and conda
make install
to configure the development environment:
pre-commit
python -m spacy download en_core_web_sm
conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl
in order to install faiss and its associated dependencies.Technical and working style guidelines
Project based on Nesta's data science project template (Read the docs here).