pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
https://pymupdf.readthedocs.io
GNU Affero General Public License v3.0
5.67k stars 528 forks source link

Failed to import fitz on AWS lambda #430

Closed gavinLow8128 closed 4 years ago

gavinLow8128 commented 4 years ago

I am trying to develop a pdf to image serverless function by AWS lambda. The import statement is import fitz

Howerver, I got the following error when triggering the lambda function.

[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py)

Thank you very much! I am using python 3.8 ,PyMuPDF-1.16.10

JorjMcKie commented 4 years ago

Hm, never seen this error before, and I also did not understand what you are actually trying to do. But the error looks like you are import fitz from within the installation folder of PyMuPDF (where __init__.py lives). This can never work.

gavinLow8128 commented 4 years ago

Thank you for the quickly response. I tried to install PyMuPDF by following the instruction listed in the following link. AWS Lambda Deployment Package in Python

It seems that PyMuPDF and my program(app.py, the python script that I imported fitz) will be placed together. Is it the main reason for causing the error? Please let me know if there is any solution

Screenshot 2020-01-20 at 6 18 45 PM

Thank you very much

JorjMcKie commented 4 years ago

Hm, actually not. The above structure works if executed locally on your computer - just tried it out (of course, the dist-info folder is not required). The imported fitz is confirmed to be taken from the fitz subfolder next to app.py.

So the problem must be how AWS Lambda supports this type of thing. I am no user, so do not know anything about it.

Code – The code and dependencies of your function. For scripting languages, you can edit your function code in the embedded editor. To add libraries, or for languages that the editor doesn't support, upload a deployment package. If your deployment package is larger than 50 MB, choose Upload a file from Amazon S3.

This quotation from AWS Lambda websites seems suggests that you must upload PyMuPDF as a deployment package. Did you do that?

JorjMcKie commented 4 years ago

Also have a look at this:

Note: For libraries that use extension modules written in C or C++, build your deployment package in an Amazon Linux environment. You can use the SAM CLI build command, which uses Docker, or build your deployment package on Amazon EC2 or AWS CodeBuild.

PyMuPDF falls under this category ...

gavinLow8128 commented 4 years ago

Problem is solved by deploying the package through AWS codeBuild. Thank you for your help!

ale-de-vries commented 4 years ago

@gavinLow8128 running into the same issue. Is it possible to share your steps for deploying the package through AWS codeBuild? Thanks!

gavinLow8128 commented 4 years ago

@ale-de-vries

Step 1: Go to CodeCommit and "create repository", then upload your project to CodeCommit.

Step 2: Create a file called "buildspec.yml". Here is my buildspec.yml for your reference.

Screenshot 2020-04-08 at 2 17 43 AM

Step 3: Go to CodeBuild and "Create Build Project". You may watch this youtube video as a reference for how to complete the project configuration. https://www.youtube.com/watch?v=6YQFcd_z4gk

Step 4: After completing the configuration, "start build" the project.

Step 5: If the project is built successfully, go to S3 bucket. CodeBuild will upload the artifact file and put it into your s3Bucket. Find it and copy the "Object URL"

Step 6: Go to your lambda Function. Select "Upload a file from Amazon S3" for "Code entry type" and paste the "Object URL" to "Amazon S3 link URL". Then "save". Screenshot 2020-04-08 at 2 37 38 AM

I hope it helps you. Please let me know if you have any other questions.

knightfall commented 4 years ago

Hi @gavinLow8128 ,

I tried your solution but I am still getting the same error as you. Any idea what am I doing wrong? Will you be able to share your folder structure?

jmac105 commented 4 years ago

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

anilomanwar commented 3 years ago

I am facing the same issue for IBM Cloud Function (similar to AWS Lambda)

I have used same build step "pip install PyMuPDF -t ." in deployment step and can see folder structure mentioned in https://github.com/pymupdf/PyMuPDF/issues/430#issuecomment-576208987

In My code,

import fitz

and getting below error - "2020-11-26T07:23:33.653912Z stderr: Traceback (most recent call last):", "2020-11-26T07:23:33.653966Z stderr: File "exec.py", line 42, in ", "2020-11-26T07:23:33.653971Z stderr: from main import main as main", "2020-11-26T07:23:33.653976Z stderr: File "/action/1/bin/main__.py", line 30, in ", "2020-11-26T07:23:33.653980Z stderr: import fitz", "2020-11-26T07:23:33.653984Z stderr: File "/action/1/bin/fitz/init.py", line 3, in ", "2020-11-26T07:23:33.653988Z stderr: from fitz.fitz import *", "2020-11-26T07:23:33.653992Z stderr: File "/action/1/bin/fitz/fitz.py", line 17, in ", "2020-11-26T07:23:33.653996Z stderr: from . import _fitz", "2020-11-26T07:23:33.654001Z stderr: ImportError: cannot import name '_fitz' from 'fitz' (/action/1/bin/fitz/init.py)", "2020-11-26T07:23:33.785866Z stderr: Command exited abruptly during initialization.", "2020-11-26T07:23:33.786Z stderr: The action did not initialize or run as expected. Log data might be missing."

I tried 2-3 ways doing this but getting same issue like "pip install PyMuPDF" "pip install PyMuPDF==1.16.10 -t ." "pip install PyMuPDF==1.18.10 -t ."

I am using other packages like pypdf, pdfminer using same way and they are working fine but not this one..

Not got any issues during build step only getting issue for import statement.

MisterMahuron commented 3 years ago

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

@jmac105 Any idea how this individual got it to work? I have PyMuPDF as a package in my layer but I am still getting the exact same error as the individual who opened this ticket. All other packages in my layer are importing correctly. Any help would be greatly appreciated. I also appreciate the arn repo but am hoping to avoid using this if at all possible.

jmac105 commented 3 years ago

For anyone else that ends up here with the same problem: You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

@jmac105 Any idea how this individual got it to work? I have PyMuPDF as a package in my layer but I am still getting the exact same error as the individual who opened this ticket. All other packages in my layer are importing correctly. Any help would be greatly appreciated. I also appreciate the arn repo but am hoping to avoid using this if at all possible.

I'd recommend contacting maintainer of that repo and try asking them, but it looks like they are using severless framework to build the layers. I do believe that you need to build your layer on the same OS as it will run in on lambda (amazon linux or amazon linux 2 depending on python version).

thematheusgomes commented 3 years ago

Hey guys,

I have the same issue deploy with serverless framework

I tryed to create a layer, but the issue still the same.

This is the error I'm getting on lambda console:

image

amiantos commented 3 years ago

Just in case it helps anyone, I was using Fitz in Lambda just fine for the past several months (I automate the build this way) under the Python 3.6 runtime. When I switched to the Python 3.8 run time, I started getting this import error. I switched back to 3.6 and everything is working fine again.

carlosgaonad commented 3 years ago

Hi everyone

I had the same issue deploying my lambda.

I tried many ways to solved this problem but the only solution was using a vitual machine with python 3.7 and install PyMuPDF.

The next steep was download the library from your virtual machine. And Then create your zip file using that library

image

probably this file is to heavy < _fitz.cpython-37m-x86_64-linux-gnu.so > but is necesary.

This method finally worked for me!!

Konstantina-Paraskevopoulou commented 3 years ago

I had a similar problem when I was trying to import some tensorflow probability modules like below: import tensorflow_probability as tfp tfp = tfp.substrates.numpy tfd = tfp.distributions

At least for me, I realized that the problem was not related to lambda but it was a Python circular import error. Have a look at this: https://stackabuse.com/python-circular-imports

Changing the position of the imports solved my issue. Basically I switched the tfp and tfd import tensorflow_probability as tfp tfd = tfp.distributions tfp = tfp.substrates.numpy

VanntheRed commented 3 years ago

I'm stuck on the same problem for a python sftp package that requires paramiko. I compiled a package and tested it on a windows EC2 instance without issue. When I then tried to make it a lambda to see if I could accomplish the task serverlessly I got the same error about non-native packages. I'm trying to recreate the CodeCommit/CodeBuild solution but I'm getting an error with the buildspec.yaml: Phase context status code: YAML_FILE_ERROR Message: mapping values are not allowed in this context at line 2

My buildspec.yaml is: version: 0.1

phases: install: runtime-versions: python: 3.8 pre_build: commands: build: commands:

I'm not certain if the problem is the yaml (it passed a yaml parser) or the contents of my CodeCommit. All I have there is my python script and the yaml document. Does a download of the package need to be there?

TIA, VtR

Ricardomol commented 3 years ago

I'm getting this exact error as well:

Runtime.ImportModuleError: Unable to import module 'lambda': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py) 

I'm already using KLayers to generate the Layer.

This is what my zip file contains: Screenshot 2021-08-16 at 19 09 39

Anyone else who made it work with KLayers, give us more details please.

RodPienaar commented 2 years ago

If you use pip to install a python package locally, which contains compiled code, the package (wheel) that is downloaded may not be compatible with AWS Lambda (in my case it was mac rather than linux). So if you deploy this locally installed file to your lambda this will cause the error, fitz not found, when you run your lambda, even if your code works locally. This will be the case with any binary package, not just fitz.

Lambdas need a linux compatible binary. As noted in some of the answers above the solution is to package the binary and load it as a lambda layers. This is easy to do in three steps that worked for me: 1) download the relevant binary, some aws advice here. Basically you need to unzip the relevant wheel file. 2) re-package the binary in a zip file with the right structure. See this stackoverflow answer 3) create a layer and upload the zip file using this aws page.

MiniMarvin commented 2 years ago

For anyone else that ends up here with the same problem:

You can use lambda layers to provide the pymupdf dependency, instead of building it into the deployment package. There's already a maintained layer for it, see here: https://github.com/keithrozario/Klayers

That way you don't need to bother with building on amazon linux, just reference the arn of the layer you need in your lambda function creation (region specific) and import as normal.

This answer seems to be the best one up to now, it works currently, to anyone still receiving the erro [ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/var/task/fitz/__init__.py) using this solution, the proper solution is to use the python3.8 lambda environment My function definition is as follow:

parse_pdf:
    runtime: python3.8
    handler: pdfparse/pdfparse/handler.pdf_parse
    layers:
      - arn:aws:lambda:${self:provider.region}:770693421928:layer:Klayers-p38-PyMUPDF:1

It works just fine after defining it this way

anai-s commented 2 years ago

If you use pip to install a python package locally, which contains compiled code, the package (wheel) that is downloaded may not be compatible with AWS Lambda (in my case it was mac rather than linux). So if you deploy this locally installed file to your lambda this will cause the error, fitz not found, when you run your lambda, even if your code works locally. This will be the case with any binary package, not just fitz.

Lambdas need a linux compatible binary. As noted in some of the answers above the solution is to package the binary and load it as a lambda layers. This is easy to do in three steps that worked for me:

  1. download the relevant binary, some aws advice here. Basically you need to unzip the relevant wheel file.
  2. re-package the binary in a zip file with the right structure. See this stackoverflow answer
  3. create a layer and upload the zip file using this aws page.

I'm working on mac and it works perfectly. For those who need an example of how to install the package you can try this: pip install pyMUPDF --upgrade --only-binary=:all: --platform manylinux_2_17_x86_64 --python-version 38

pschlank commented 1 year ago

Hey all, @anai-s answer is right on point. I ran this command in the terminal:

pip install \ --platform manylinux2014_x86_64 \ --target=/Users/schlank/Documents/Code/pythonlayers/upload/python \ --implementation cp \ --python 3.9 \ --only-binary=:all: --upgrade \ --ignore-installed \ PyMuPDF

Some additional detail. Make sure you:

  1. Take note of your target directory
  2. It's key you put it inside a folder named "python"
  3. Compress the folder named "python" into a .zip
  4. Go to Lambdas > Layers in your AWS console
  5. Create a new layer
  6. Select the x86_64 architecture and python 3.9 compatibility
  7. Add your new layer arn to your serverless yaml (or attach the layer the lambda itself in the console)

Then you're to go.

ecumene commented 1 year ago

For those trying to get fitz working for Lambda using the python3.9 runtime, and you're on an M1 Mac... Try installing them with docker. This worked for me:

I put them in a folder named requirements/python. Then I zip that up for the layer

mkdir -p requirements/python;
docker run \
  -v "$(pwd)":/var/task "public.ecr.aws/sam/build-python3.9" \
  /bin/sh -c "yum install -y mysql-devel && \
  pip install -r requirements.txt  --only-binary=:all: --platform manylinux_2_17_x86_64 -t requirements/python; \
  exit";
RaviWittyBrains commented 10 months ago

I'm encountering an issue while attempting to add the Fitz library (PyMuPDF) to a Lambda layer. The error message I'm getting is:

{
  "errorMessage": "Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/opt/python/lib/python3.11/site-packages/fitz/__init__.py)",
  "errorType": "Runtime.ImportModuleError",
}

Function Logs
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_fitz' from partially initialized module 'fitz' (most likely due to a circular import) (/opt/python/lib/python3.11/site-packages/fitz/__init__.py)

I'm seeking guidance on successfully utilizing the Fitz library within a Lambda function. Below is the Lambda code snippet:

import fitz

def lambda_handler(event, context):
    try:
        print("Hello World")
    except Exception as e:
        print(f"Error in fileToTextract: {str(e)}")

Is there anyone who has successfully integrated this library into their Lambda and can offer advice on resolving this issue?

JorjMcKie commented 10 months ago

This may help you.

egill commented 9 months ago

Not sure it will help everyone, but this works for me with the latest PyMuPDF (1.3.23) and Python 3.12:

import fitz_old as fitz

I tracked my problems to the recent rebase in PyMuPDF, and using the pre-rebased version worked like a charm for me.

It's not the best solution but works for now.

Elliotmrgn commented 7 months ago

Worked for me when I added it as a layer. As mentioned before, the key is to install a compatible version and use the correct path in your .zip file.

pip install \
--platform manylinux2014_x86_64 \
--target=./python/lib/python3.12/site-packages \
--implementation cp \
--python-version 3.12 \
--only-binary=:all: --upgrade \
pymupdf

Then zip the code to upload as a layer. The file structure will look like:

my_layer.zip
└── python/
    └── lib/
        └── python3.12/
            └── site-packages/
                └── fitz/
Jacer7 commented 7 months ago

No need of layer, no need of CodeBuild.....!!!! This problem exists because the library fitz is written in C / C++ layering with python and it has specific set of architecture with specific OS. So, we need to build our deployment_package.zip specifying all these. Here is an article about it. I am sure it will solve all the pain above that I've seen here 😄 https://medium.com/@jayshwor.khadka/lambda-deployment-package-with-dependencies-and-local-built-distribution-wheels-with-different-affe82b982fa

jacksonkasi1 commented 2 months ago

No need of layer, no need of CodeBuild.....!!!! This problem exists because the library fitz is written in C / C++ layering with python and it has specific set of architecture with specific OS. So, we need to build our deployment_package.zip specifying all these. Here is an article about it. I am sure it will solve all the pain above that I've seen here 😄 https://medium.com/@jayshwor.khadka/lambda-deployment-package-with-dependencies-and-local-built-distribution-wheels-with-different-affe82b982fa

Thanks, it's work for me :)

Jacer7 commented 2 months ago

@jacksonkasi1 Glad the solution proposed worked for you :D !! 👍