pyca / cryptography

cryptography is a package designed to expose cryptographic primitives and recipes to Python developers.
https://cryptography.io
Other
6.68k stars 1.53k forks source link

Importerrer in the Spark #6968

Closed yang-shuaijun closed 2 years ago

yang-shuaijun commented 2 years ago

Python: 3.7.10 pip: 20.2.2 cryptography: 36.0.2 Spark: 3.1.2-amzn-1

if I load the cryptography from --pyfiles, the error raised:

[hadoop@ip-172-31-21-108 ~]$ pyspark --py-files dependencies.zip
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/emr/emrfs/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/16 10:45:46 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Python version 3.7.10 (default, Jun  3 2021 00:02:01)
Spark context Web UI available at http://ip-172-31-21-108.cn-north-1.compute.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1647323952412_0011).
SparkSession available as 'spark'.
>>> from cryptography.hazmat.primitives import padding
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/tmp/spark-66e602ba-ad79-4dae-af55-76c7cc225c89/userFiles-80a59a58-1cd4-4962-8115-504c1bab5097/dependencies.zip/cryptography/hazmat/primitives/padding.py", line 11, in <module>
ImportError: cannot import name 'check_ansix923_padding' from 'cryptography.hazmat.bindings._rust' (unknown location)

The dependencies.zip was bundled from /home/hadoop/.local/lib/python3.7/site-packages/.

But this method was able to load othen sub modules

[hadoop@ip-172-31-21-108 ~]$ pyspark --py-files dependencies.zip 
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/emr/emrfs/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/16 11:39:11 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Python version 3.7.10 (default, Jun  3 2021 00:02:01)
Spark context Web UI available at http://ip-172-31-21-108.cn-north-1.compute.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1647323952412_0014).
SparkSession available as 'spark'.
>>> from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
>>> 

If I directly install cryptography in emr platform, it worked well.

[hadoop@ip-172-31-21-108 site-packages]$ pyspark 
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/emr/emrfs/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/16 11:25:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Python version 3.7.10 (default, Jun  3 2021 00:02:01)
Spark context Web UI available at http://ip-172-31-21-108.cn-north-1.compute.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1647323952412_0013).
SparkSession available as 'spark'.
>>> from cryptography.hazmat.primitives import padding
>>> 
alex commented 2 years ago

How is your dependencies.zip file built, does it contain the native code extension modules?

yang-shuaijun commented 2 years ago

I directly bundled the files in the /home/hadoop/.local/lib/python3.7/site-packages

alex commented 2 years ago

Looking closely at the exception message I see that the path contains: /dependencies.zip/cryptography/hazmat/primitives/padding.py. This tells me that spark directly loads python files out of the .zip.

Unfortunately extension modules cannot be loaded out of the .zip file, so this can't work. You'll need to find out if spark has some other way of specifying files that supports extension modules.

yang-shuaijun commented 2 years ago

Looking closely at the exception message I see that the path contains: /dependencies.zip/cryptography/hazmat/primitives/padding.py. This tells me that spark directly loads python files out of the .zip.

Unfortunately extension modules cannot be loaded out of the .zip file, so this can't work. You'll need to find out if spark has some other way of specifying files that supports extension modules.

Spark will unzip the zip file and then load the python modules. Just like this

[hadoop@ip-172-31-21-108 ~]$ pyspark --py-files dependencies.zip 
Python 3.7.10 (default, Jun  3 2021, 00:02:01) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/emr/emrfs/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/03/16 11:39:11 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Python version 3.7.10 (default, Jun  3 2021 00:02:01)
Spark context Web UI available at http://ip-172-31-21-108.cn-north-1.compute.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1647323952412_0014).
SparkSession available as 'spark'.
>>> from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
>>> 
alex commented 2 years ago

That import is pure python, pure python files can be imported directly from .zip files but extension modules cannot.

yang-shuaijun commented 2 years ago

Cipher, algorithms, modes

Are Cipher, algorithms, modes pure python code? It doesn't call other extension modules.

alex commented 2 years ago

Those imports are pure python, actually using them to do encryption/decryption will call into extension modules.

On Wed, Mar 16, 2022 at 7:47 AM Picoman Yang @.***> wrote:

Cipher, algorithms, modes

Are Cipher, algorithms, modes pure python code? It doesn't call other extension modules.

— Reply to this email directly, view it on GitHub https://github.com/pyca/cryptography/issues/6968#issuecomment-1069039460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDPO44HKHERM7LLXLLVAHC3RANCNFSM5Q3PVYAA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you modified the open/close state.Message ID: @.***>

-- All that is necessary for evil to succeed is for good people to do nothing.

yang-shuaijun commented 2 years ago

Thank your analysis in detail. The cryptography is excellent. I use it to mask the sensitive data.