huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.18k stars 26.33k forks source link

Loading transformer on AWS Lambda throws OMP errno 38 #20345

Closed djdevdev closed 1 year ago

djdevdev commented 1 year ago

System Info

Apologies if this is the wrong place to post but we're looking for pointers on tracking down what appears to be a transformers-related error.

We have trained a Spacy 3.3.1 transformer textcat which we're deploying as an AWS Python 3.9 Docker image to AWS Lambda. The model loads and infers correctly on the Linux development host (both using a test Python script and also using AWS SAM local), but fails in the Lambda runtime with OpenMP runtime error no 38 (see Lambda error output below).

A web search suggests this error occurs because Lambda doesn't support Python multiprocessing, specifically it doesn't mount /dev/shm, leading to the error (see links below). The Spacy team have confirmed they do not directly invoke multiprocessing but that transformers does (see https://github.com/explosion/spaCy/discussions/11836#discussioncomment-4193368).

Further testing revealed that loading a blank Spacy model inside the Lambda runtime works perfectly, but loading the transformer on Python 3.7 gives the error, as does the base transformer model spacy.load("en_core_web_trf"). We conclude that transformers is using multiprocessing incompatible with AWS Lambda.

A solution could be to disable transformer multiprocessing when loading the Spacy model. Any suggestions how we can disable OpenMP multiprocessing through a runtime setting? Or as a last resort we may need to override multiprocessing.Pool/Queue with multiprocessing.Process/Pipe which apparently do work on Lamda (suggested in links below).

Lambda error output

OMP: Error #179: Function Can't open SHM2 failed:
OMP: System error #38: Function not implemented
OMP: Error #179: Function Can't open SHM2 failed:
OMP: System error #38: Function not implemented
START RequestId: XYZ Version: $LATEST
RequestId: XYZ Error: Runtime exited with error: signal: aborted
Runtime.ExitError
END RequestId: XYZ
REPORT RequestId: XYZ   Duration: 547.37 ms Billed Duration: 548 ms Memory Size: 3008 MB    Max Memory Used: 142 MB

Relevant links https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/ https://spacy.io/usage/processing-pipelines#multiprocessing https://forum.opennmt.net/t/unable-to-create-ctranslate2-translator-in-aws-lambda/4922 https://stackoverflow.com/questions/34005930/multiprocessing-semlock-is-not-implemented-when-running-on-aws-lambda

Who can help?

@LysandreJik

Information

Tasks

Reproduction

  1. Create conda environment.yml file with Spacy 3.3.1 (which installs transformers=4.18.0 as a dependency)

    channels:
    - defaults
    dependencies:
    - python=3.9.15
    - spacy-transformers=1.1.5
    - spacy-model-en_core_web_sm=3.3.0
    - spacy-model-en_core_web_trf=3.3.0
  2. Create Dockerfile (relevant extract shown below)

    FROM public.ecr.aws/lambda/python:3.9
    RUN wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    RUN conda env update -n base -f environment.yml
  3. Create Lambda Python handler

    
    import spacy

def lambda_handler(event, context):

Works in AWS Lambda Python 3.9 runtime

nlp = spacy.load("en_core_web_sm")

# Throws OMP errno 38 in AWS Lambda Python 3.9 runtime
nlp = spacy.load("en_core_web_trf")

return {
    "statusCode": 200
}


### Expected behavior

Lambda execution completes successfully and returns code 200.
sgugger commented 1 year ago

There is little we can do without knowing which specific code in Transformers you are running. Loading a model with Transformers in general does not use Python multiprocessing for instance, so it's a bit hard for us to know what you want us to fix without a clear reproducer (using Transformers only and not a third-party library).

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.