nestauk / ojo_local_indicators

A project that builds region-specific indicators of skill demands from the Open Jobs Observatory.
0 stars 0 forks source link

Running greenjobs model inside ojo_local_indicators #2

Open izzyStewart opened 2 years ago

izzyStewart commented 2 years ago

Currently not able to run the greenjobs algorthm inside the ojo_local_indicators local repository.

Steps taken:

  1. Cloned git@github.com:nestauk/grjobs.git
  2. Ran make install inside the grjobs repo
  3. Ran pip install https://github.com/nestauk/grjobs/archive/refs/heads/dev.zip inside ojo_local_indicators
  4. Successfully ran import grjobs as grj inside a notebook in ojo_local_indicators
  5. Tried running from grjobs.pipeline.green_classifier import load_model in the same notebook and got the message:
TypeError                                 Traceback (most recent call last)
/var/folders/41/cjzn4l0n62d58h20h6v2sz0h0000gn/T/ipykernel_1341/2291304684.py in <module>
----> 1 from grjobs.pipeline.green_classifier import load_model
      2 model = load_model('best_model')
      3 model.predict([{'job_title_raw': 'job title', 'description': 'wastewater management, forest management, environmental consulting and in-house business activities that include waste and recycling. Our methodology identifies both critical roles (e.g. a renewable energy engineer) and general roles (e.g. an accountant for a green energy company) within these sectors.'},
      4               {'job_title_raw': 'renewable energy engineer', 'description': '(e.g. an accountant for a green energy company) within these sectors.'},
      5               {'job_title_raw': 'energy consultant', 'description': 'ambient atmosphere, climate, environment'}])

~/opt/anaconda3/envs/ojo_local_indicators/lib/python3.8/site-packages/grjobs/pipeline/green_classifier.py in <module>
     27 # Load config file
     28 grjobs_config = get_yaml_config(Path(str(PROJECT_DIR) + "/grjobs/config/base.yaml"))
---> 29 green_list_path = str(PROJECT_DIR) + grjobs_config["GREEN_LIST_PATH"]
     30 
     31 # %%

TypeError: 'NoneType' object is not subscriptable
jaklinger commented 2 years ago

Looks like the installed grjobs doesn't have access to the grjobs config directory (see that get_yaml_config returns None if the config file in grjobs doesn't exist). One solution, I think, is to add:

package_data={'': ['*yaml*']},
include_package_data=True,

to https://github.com/nestauk/grjobs/blob/dev/setup.py#L21