intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.71k stars 1.26k forks source link

[ppml] cannot resolve '`name`' given input columns KMS encrypted data #5699

Open Le-Zheng opened 2 years ago

Le-Zheng commented 2 years ago

Error of running SimpleQuery example with bigdl-ppml-spark_3.1.2-2.1.0-20220907.120744-222-jar-with-dependencies.jar

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [?b??cI?-??(?e??@??U0??Sw?:5$E?p'??Y??>??       , ???mU?1?'???u?Y?_?;&???#<?w?????1Gn??q;%?+???5??;?];
'Project ['name]
+- Relation[?b??cI?-??(?e??@??U0??Sw?:5$E?p'??Y??>??    #16,???mU?1?'???u?Y?_?;&???#<?w?????1Gn??q;%?+???5??;?#17] csv

MicrosoftTeams-image

Le-Zheng commented 2 years ago

Sample spark submit :

/opt/spark/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--name simplequery \
--conf spark.driver.memory=20g \
--conf spark.executor.cores=16 \
--conf spark.executor.memory=20g \
--conf spark.executor.instances=1 \
--conf spark.cores.max=16 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.network.timeout=10000000 \
--conf spark.executor.heartbeatInterval=10000000 \
--conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \
--conf spark.authenticate=true \
--conf spark.authenticate.secret=intel@123 \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/bigdl2.0/data/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/bigdl2.0/data/keystore.jks \
--conf spark.ssl.trustStorePassword=intel@123 \
--conf spark.ssl.trustStoreType=JKS \
--class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
--conf spark.driver.extraClassPath=local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--jars local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--inputPath /bigdl2.0/data/ppml/people/encrypted \
--outputPath /bigdl2.0/data/ppml/people/people_encrypted_output \
--inputPartitionNum 16 \
--outputPartitionNum 16 \
--inputEncryptModeValue AES/CBC/PKCS5Padding \
--outputEncryptModeValue AES/CBC/PKCS5Padding \
--primaryKeyPath /bigdl2.0/data/ppml/20line_data_keys/primaryKey \
--dataKeyPath /bigdl2.0/data/ppml/20line_data_keys/dataKey \
--kmsType SimpleKeyManagementService \
--simpleAPPID 165172133285

When we replacebigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar with bigdl-ppml-spark_3.1.2-2.1.0-20220907.120744-222-jar-with-dependencies.jar, the above issue occurs.

ShanSimu commented 2 years ago

I just got the same error. image here is my script:

rm -rf /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output && \
export mode=client && \
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \
./clean.sh
gramine-argv-serializer bash -c "/opt/jdk8/bin/java \
  -cp '/ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/scopt_2.12-3.7.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*' \
    -Xmx8g \
    org.apache.spark.deploy.SparkSubmit \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode cluster \
    --name spark-simplequery-sgx \
    --conf spark.driver.host=$LOCAL_IP \
    --conf spark.driver.port=54321 \
    --conf spark.driver.memory=32g \
    --conf spark.executor.cores=8 \
    --conf spark.executor.memory=32g \
    --conf spark.executor.instances=2 \
    --conf spark.cores.max=32 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.network.timeout=10000000 \
    --conf spark.executor.heartbeatInterval=10000000 \
    --conf spark.python.use.daemon=false \
    --conf spark.python.worker.reuse=false \
    --conf spark.kubernetes.sgx.enabled=true \
    --conf spark.kubernetes.sgx.driver.mem=64g \
    --conf spark.kubernetes.sgx.driver.jvm.mem=12g \
    --conf spark.kubernetes.sgx.executor.mem=64g \
    --conf spark.kubernetes.sgx.executor.jvm.mem=12g \
    --conf spark.kubernetes.sgx.log.level=error \
    --conf spark.authenticate=true \
    --conf spark.authenticate.secret=$secure_password \
    --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.authenticate.enableSaslEncryption=true \
    --conf spark.network.crypto.enabled=true \
    --conf spark.network.crypto.keyLength=128 \
    --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
    --conf spark.io.encryption.enabled=true \
    --conf spark.io.encryption.keySizeBits=128 \
    --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
    --conf spark.ssl.enabled=true \
    --conf spark.ssl.port=8043 \
    --conf spark.ssl.keyPassword=$secure_password \
    --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.keyStorePassword=$secure_password \
    --conf spark.ssl.keyStoreType=JKS \
    --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.trustStorePassword=$secure_password \
    --conf spark.ssl.trustStoreType=JKS \
    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/* \
    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/* \
    --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
    --verbose \
    --jars local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
    local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
    --inputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted \
    --outputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output \
    --inputPartitionNum 8 \
    --outputPartitionNum 8 \
    --inputEncryptModeValue AES/CBC/PKCS5Padding \
    --outputEncryptModeValue AES/CBC/PKCS5Padding \
    --primaryKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/primaryKey \
    --dataKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/dataKey \
    --kmsType SimpleKeyManagementService \
    --simpleAPPID 947536384638 \
    --simpleAPPKEY 884926981201" > /ppml/trusted-big-data-ml/secured_argvs
./init.sh
gramine-sgx bash 2>&1 | tee query-client-simple.log
ShanSimu commented 2 years ago

This may be due to the jar package path. Here is my early scrpit without this error:

rm -rf /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output && \

export mode=client && \
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \
./clean.sh
gramine-argv-serializer bash -c "/opt/jdk8/bin/java \
  -cp '/ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/scopt_2.12-3.7.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*' \
    -Xmx8g \
    org.apache.spark.deploy.SparkSubmit \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode client \
    --name spark-simplequery-sgx \
    --conf spark.driver.host=$LOCAL_IP \
    --conf spark.driver.port=54321 \
    --conf spark.driver.memory=32g \
    --conf spark.executor.cores=8 \
    --conf spark.executor.memory=32g \
    --conf spark.executor.instances=2 \
    --conf spark.cores.max=32 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.network.timeout=10000000 \
    --conf spark.executor.heartbeatInterval=10000000 \
    --conf spark.python.use.daemon=false \
    --conf spark.python.worker.reuse=false \
    --conf spark.kubernetes.sgx.enabled=true \
    --conf spark.kubernetes.sgx.executor.mem=64g \
    --conf spark.kubernetes.sgx.executor.jvm.mem=12g \
    --conf spark.kubernetes.sgx.log.level=error \
    --conf spark.authenticate=true \
    --conf spark.authenticate.secret=$secure_password \
    --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.authenticate.enableSaslEncryption=true \
    --conf spark.network.crypto.enabled=true \
    --conf spark.network.crypto.keyLength=128 \
    --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
    --conf spark.io.encryption.enabled=true \
    --conf spark.io.encryption.keySizeBits=128 \
    --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
    --conf spark.ssl.enabled=true \
    --conf spark.ssl.port=8043 \
    --conf spark.ssl.keyPassword=$secure_password \
    --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.keyStorePassword=$secure_password \
    --conf spark.ssl.keyStoreType=JKS \
    --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.trustStorePassword=$secure_password \
    --conf spark.ssl.trustStoreType=JKS \
    --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
    --verbose \
    --jars local:///ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
    local:///ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
    --inputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted \
    --outputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output \
    --inputPartitionNum 8 \
    --outputPartitionNum 8 \
    --inputEncryptModeValue AES/CBC/PKCS5Padding \
    --outputEncryptModeValue AES/CBC/PKCS5Padding \
    --primaryKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/primaryKey \
    --dataKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/dataKey \
    --kmsType SimpleKeyManagementService \
    --simpleAPPID 947536384638 \
    --simpleAPPKEY 884926981201" > /ppml/trusted-big-data-ml/secured_argvs
./init.sh
gramine-sgx bash 2>&1 | tee spark-simplequery-sgx-driver-on-sgx.log
PatrickkZ commented 2 years ago

It seems like the encrypted file and the encrypted keys are not match, please try to generate a new encrypted file with your current primaryKey and dataKey.

ShanSimu commented 2 years ago

thx @PatrickkZ It turned out to be a problem with my script

PatrickkZ commented 2 years ago

This error is because the encrypted file do not get decrypted, now, the encrypted file name has to end with .cbc, this extension file name will trigger the decrypt process, so try to change your encrypted file name, Example, people.csv to people.csv.cbc. But the required input file name in SimpleQuerySparkExample is fixed to people.csv, so you also need to modify SimpleQuerySparkExample' s code.