keithrozario / Klayers

Python Packages as AWS Lambda Layers
Other
2.18k stars 313 forks source link

Not able to use spacy "/opt/en_core_web_sm-2.2.5" #448

Open willianfalbo opened 1 month ago

willianfalbo commented 1 month ago

Hey guys, I've been trying to use python38-spacy:42 and python38-spacy_model_en_small:1, but they are not working. Could you please help me?

Here is my template yaml file:

# template.yaml

AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  GetWordCounts:
    Type: AWS::Serverless::Function
    Properties:
      Handler: word-counts/app.lambda_handler
      Runtime: python3.8
      CodeUri: .
      Timeout: 30
      Layers:
        - arn:aws:lambda:us-east-1:770693421928:layer:Klayers-python38-spacy:42
        - arn:aws:lambda:us-east-1:770693421928:layer:Klayers-python38-spacy_model_en_small:1
      Events:
        ApiGateway:
          Type: Api
          Properties:
            Path: /word-counts
            Method: get

Here is a simple file handler for the requests:

# word-counts/app.py

import json
import spacy

nlp = spacy.load("/opt/en_core_web_sm-2.2.5")

def lambda_handler(event, context):
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

To start the API locally, I run the following command:

sam local start-api --profile my-profile

Then, it fails when I do a GET request to that endpoint, like:

# GET http://localhost:3000/word-counts

/opt/python/spacy/util.py:717: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.6,<3.1.0
  warnings.warn(warn_msg)
[ERROR] OSError: [E053] Could not read config.cfg from /opt/en_core_web_sm-2.2.5/config.cfg
Traceback (most recent call last):
  File "/var/lang/lib/python3.8/imp.py", line 234, in load_module
    return load_source(name, filename, file)
  File "/var/lang/lib/python3.8/imp.py", line 171, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 702, in _load
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/task/word-counts/app.py", line 4, in <module>
    nlp = spacy.load("/opt/en_core_web_sm-2.2.5")
  File "/opt/python/spacy/__init__.py", line 50, in load
    return util.load_model(
  File "/opt/python/spacy/util.py", line 326, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/opt/python/spacy/util.py", line 390, in load_model_from_path
    config = load_config(config_path, overrides=dict_to_dot(config))
  File "/opt/python/spacy/util.py", line 547, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
18 Oct 2024 02:08:37,264 [ERROR] (rapid) Init failed error=Runtime exited with error: exit status 1 InvokeID=
18 Oct 2024 02:08:37,268 [ERROR] (rapid) Invoke failed InvokeID=7a3bcb82-b685-49c2-80f5-8cf98d53bd1f error=Runtime exited with error: exit status 1
18 Oct 2024 02:08:37,268 [ERROR] (rapid) Invoke DONE failed: Sandbox.Failure

I would appreciate any help. Thanks

keithrozario commented 1 month ago

SOrry this is a spacy specific issue, and it's been a while since I tried this.

Found a similar issue here: https://github.com/explosion/spaCy/issues/7453

It might fix your issue -- and you'll probably get better luck checking your queries there.

Tip: For these large packages (e.g. Spacy) it's probably better to use container images instead of Lambda layers -- this projects predates the ability of packing containers into lambda hence we tried supporting it for a while.