nestauk / ojd_daps_skills

Nesta's Skills Extractor Library
https://nestauk.github.io/ojd_daps_skills/
118 stars 19 forks source link

No model attached to package #224

Closed Socvest closed 3 months ago

Socvest commented 3 months ago

I have followed the advice from various past issues asking to install the dev version of this package but still get this error:

AttributeError: 'JobNER' object has no attribute 'nlp'

I tried downloading the model from s3 but it seems no access has been granted. I am getting this error:

fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

Its virtually unusable.

Is there a fix for this?

lizgzil commented 3 months ago

@Socvest - thanks for opening this issue. It should be fixed now - please can you let us know if you can download it?

Socvest commented 3 months ago

@lizgzil Thanks for your reply.

Its still not working. I am using both the internal function from

from ojd_daps_skills.pipeline.skill_ner import ner_spacy 
JobNER = ner_spacy.JobNER()
JobNER.load_model(model_folder="escoe_extension/outputs/models/ner_model/20230808/", s3_download=True)

and the command line version:

aws s3 sync s3://open-jobs-lake/escoe_extension/outputs/models/ner_model/20230808/ outputs/models/ner_model/20230808/

Is there anything wrong on my end?

lizgzil commented 3 months ago

ah yes - I think you will need to download the files from this S3 location s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip. The open-jobs-lake bucket was never meant to be for use outside of Nesta.

Socvest commented 3 months ago

Thank you for letting me know. open-jobs-lake is what is hard coded as the s3 bucket name in the package so that explains why I don't have access to much of the package's functions.

Also, in the ...\Lib\site-packages\ojd_daps_skills\__init__.py I had to change the PUBLIC_DATA_FOLDER_NAME to ojd_daps_skills_data_new before it downloaded the recommended files.

The hardcoded bucket name saved in this same file is bucket_name = "open-jobs-lake"

However, I am still getting the error:

fatal error: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
lizgzil commented 3 months ago

thank you - forgot to add the _new. We are currently planning a huge refactor of this repo - so it's all a little bit in flux at the moment!

Is that error when doing the command line aws sync e.g. aws s3 sync s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip ... ?

Socvest commented 3 months ago

Ah okay, look forward to that.

Yeah, thats the error when I do that with this link:

aws s3 sync s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip ...
lizgzil commented 3 months ago

Could you try again with your original code:

from ojd_daps_skills.pipeline.skill_ner import ner_spacy 
JobNER = ner_spacy.JobNER()
JobNER.load_model(model_folder="escoe_extension/outputs/models/ner_model/20230808/", s3_download=True)
Socvest commented 3 months ago

I did, and I still get the same error:

2024-06-24 12:58:45,921 - SkillsExtractor - INFO - Loading the model from outputs/models/ner_model/20230808/ (ner_spacy.py:510)
2024-06-24 12:58:45,929 - SkillsExtractor - WARNING - Model not found locally - you may need to download it from S3 (set s3_download to True) (ner_spacy.py:517)
lizgzil commented 3 months ago

sorry about this! Not sure whats going on.

I can see that the open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip permissions are 'read' for public access. So I'm not sure why

aws s3 sync s3://open-jobs-indicators/escoe_extension/ojd_daps_skills_data_new.zip escoe_extension/ojd_daps_skills_data_new.zip

doesn't work.

Where are you located? Wondering if it's to do with non-UK locations?

Socvest commented 3 months ago

No problem.

Yeah in the UK.

I was able to get the files from the link here: https://github.com/nestauk/ojd_daps_skills/issues/175#issuecomment-2155536340

But not sure about the model file. Not in there it seems.

lizgzil commented 3 months ago

ok - I really hope you can get the files from the link https://open-jobs-indicators.s3.amazonaws.com/escoe_extension/ojd_daps_skills_data_new.zip now (and the model should be in there now?).

then after unzipping and storing the folder in the parent dir, I hope:

from ojd_daps_skills.pipeline.skill_ner import ner_spacy 
JobNER = ner_spacy.JobNER()
JobNER.load_model(model_folder="ojd_daps_skills_data_new/outputs/models/ner_model/20230808/", s3_download=False)

will work?

Socvest commented 3 months ago

Wonderful! Works like a charm!

Thank you! :)

lizgzil commented 3 months ago

That's amazing news! Thanks for bringing it to our attention.