graviraja / MLOps-Basics

MIT License
6.08k stars 1.02k forks source link

AWS Lambda Function: Test error #28

Open VirajBagal opened 3 years ago

VirajBagal commented 3 years ago

I am following Week 8 Blog post. When I deploy the container using Lambda and try to test it using the Test section, the Execution fails. I get the following log. Can you please help with this? Does this function already have internet access to download that model? (Sorry if the question is naive)

es/transformers/file_utils.py", line 1518, in get_from_cache
os.makedirs(cache_dir, exist_ok=True)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/home/sbx_user1051'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/main.py", line 425, in main
run(app, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/main.py", line 447, in run
server.run()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/server.py", line 69, in run
return asyncio.get_event_loop().run_until_complete(self.serve(sockets=sockets))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/server.py", line 76, in serve
config.load()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/config.py", line 448, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./app.py", line 5, in <module>
predictor = ColaONNXPredictor("./models/model.onnx")
File "./inference_onnx.py", line 12, in __init__
self.processor = DataModule()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/datamodule.py", line 49, in __call__
obj = type.__call__(cls, *args, **kwargs)
File "./data.py", line 20, in __init__
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
File "/usr/local/lib/python3.6/dist-packages/transformers/models/auto/tokenization_auto.py", line 534, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/models/auto/configuration_auto.py", line 450, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 532, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'google/bert_uncased_L-2_H-128_A-2'. Make sure that:
- 'google/bert_uncased_L-2_H-128_A-2' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'google/bert_uncased_L-2_H-128_A-2' is the correct path to a directory containing a config.json file
END RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b
REPORT RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b  Duration: 65041.97 ms   Billed Duration: 65042 ms   Memory Size: 1024 MB    Max Memory Used: 446 MB 
RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b Error: Runtime exited with error: exit status 1
Runtime.ExitError
ravirajag commented 3 years ago

Make sure the transformers version is updated. Old transformer version might not support that model.

VirajBagal commented 2 years ago

This runs the docker file right? In the DockerFile, I have used the same transformers image as you have used, i.e : FROM huggingface/transformers-pytorch-cpu:latest. Container gets built successfully when done locally and also in GitHub actions. But somehow it gives the above error when 'Test'ed in AWS Lambda

VirajBagal commented 2 years ago

When I used the week_8 docker image in the Lambda function and tested it, it worked. The image is the following one:

FROM amazon/aws-lambda-python

ARG AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY
ARG MODEL_DIR=./models
RUN mkdir $MODEL_DIR

ENV TRANSFORMERS_CACHE=$MODEL_DIR \
    TRANSFORMERS_VERBOSITY=error

ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY

RUN yum install git -y && yum -y install gcc-c++
COPY requirements_inference.txt requirements_inference.txt
RUN pip install -r requirements_inference.txt --no-cache-dir
COPY ./ ./
ENV PYTHONPATH "${PYTHONPATH}:./"
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN pip install "dvc[s3]"
# configuring remote server in dvc
RUN dvc init --no-scm
RUN dvc remote add -d model-store s3://models-dvc/trained_models/

# pulling the trained model
RUN dvc pull dvcfiles/trained_model.dvc

RUN python lambda_handler.py
RUN chmod -R 0755 $MODEL_DIR
CMD [ "lambda_handler.lambda_handler"]

I was getting error for the week_7 docker image in the Lambda function Test. The image was the following:

FROM huggingface/transformers-pytorch-cpu:latest

COPY ./ /app
WORKDIR /app

ARG AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY

#this envs are experimental
ENV AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY

# install requirements
RUN pip install "dvc[s3]"
RUN pip install -r requirements_inference.txt

# initialise dvc
RUN dvc init --no-scm
# configuring remote server in dvc
RUN dvc remote add -d model-store s3://models-dvc-viraj/trained_models/

RUN cat .dvc/config
# pulling the trained model
RUN dvc pull dvcfiles/trained_model.dvc

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

# running the application
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
BouajilaHamza commented 2 months ago

I may nkt be understanding the issue with docker but from the error looks like the mod el does not exist in huggingfave eitger it was removed or the oath or model name are incorrect. To fix this try change model path in huggingface Search a model in huggingfave and use it