Closed astronaut0131 closed 2 years ago
@astronaut0131 thanks for you report.
Can you list the PODs you have running, especially Intel k8s device plugin and NFD (+ their versions).
Are you using in-tree SGX driver? What k8s version, which host OS?
Can you provide ls -l /dev/sgx*
on the host?
@astronaut0131 thanks for you report.
Can you list the PODs you have running, especially Intel k8s device plugin and NFD (+ their versions).
Are you using in-tree SGX driver? What k8s version, which host OS?
Can you provide
ls -l /dev/sgx*
on the host?
All related plugins are in latest version, I follow the instructions in https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/sgx_plugin/README.md#deploying-as-a-daemonset
sgx plugin status
$ kubectl get SgxDevicePlugin
NAME DESIRED READY NODE SELECTOR AGE
sgxdeviceplugin-sample 1 1 {"intel.feature.node.kubernetes.io/sgx":"true"} 3m31s
node info
kubectl describe node zhenhui-control-plane | grep sgx.intel.com
sgx.intel.com/capable=true
nfd.node.kubernetes.io/extended-resources: sgx.intel.com/epc
sgx.intel.com/enclave: 110
sgx.intel.com/epc: 4261412864
sgx.intel.com/provision: 110
sgx.intel.com/enclave: 110
sgx.intel.com/epc: 4261412864
sgx.intel.com/provision: 110
sgx.intel.com/enclave 1 1
sgx.intel.com/epc 512Ki 512Ki
sgx.intel.com/provision 0 0
host os
$ uname -r
5.13.0-41-generic
$ ls -l /dev/sgx*
crw-rw-rw- 1 root root 10, 125 7月 28 17:50 /dev/sgx_enclave
crw-rw---- 1 root sgx_prv 10, 126 7月 28 17:50 /dev/sgx_provision
/dev/sgx:
total 0
lrwxrwxrwx 1 root root 14 7月 28 17:50 enclave -> ../sgx_enclave
lrwxrwxrwx 1 root root 16 7月 28 17:50 provision -> ../sgx_provision
k8s version
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:29:09Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
BTW, I can successfully boot an enclave directly on the host os, so I think the hardware device is working correctly, the problem only occurs in a container. I suspect the problem has something to do with Kind, I'm using Kind to build a cluster.
@astronaut0131 did you get your issue resolved, or not?
@poussa Not yet, I've tried to use a real k8s cluster instead of kind, but the same error still exists.
@poussa @avalluri I finally found that the problem is related to SDK version in tcs-issuer Dockerfile
# git diff Dockerfile
diff --git a/Dockerfile b/Dockerfile
index 77d681f..0c8c43e 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -140,8 +140,6 @@ RUN mkdir -p /usr/local/share/package-licenses \
###
FROM ubuntu:focal as runtime
-ARG SDK_VERSION="2.15.100.3"
-ARG DCAP_VERSION="1.12.100.3"
RUN apt-get update \
&& apt-get install -y wget gnupg \
@@ -152,16 +150,16 @@ RUN apt-get update \
&& apt-get remove -y wget gnupg && apt-get autoremove -y \
&& bash -c 'set -o pipefail; apt-get install --no-install-recommends -y \
libprotobuf17 \
- libsgx-enclave-common=${SDK_VERSION}-focal1 \
- libsgx-epid=${SDK_VERSION}-focal1 \
- libsgx-quote-ex=${SDK_VERSION}-focal1 \
- libsgx-urts=${SDK_VERSION}-focal1 \
- libsgx-ae-epid=${SDK_VERSION}-focal1 \
- libsgx-ae-qe3=${DCAP_VERSION}-focal1 \
- libsgx-dcap-ql=${DCAP_VERSION}-focal1 \
- libsgx-pce-logic=${DCAP_VERSION}-focal1 \
- libsgx-qe3-logic=${DCAP_VERSION}-focal1 \
- libsgx-dcap-default-qpl=${DCAP_VERSION}-focal1 \
+ libsgx-enclave-common \
+ libsgx-epid \
+ libsgx-quote-ex \
+ libsgx-urts \
+ libsgx-ae-epid \
+ libsgx-ae-qe3 \
+ libsgx-dcap-ql \
+ libsgx-pce-logic \
I changed it like this and the error is gone.
Looks like the origin version of SDK has problem opening /dev/sgx_enclave
, would you consider changing the version here?
@astronaut0131 Good to hear that you could figure out the issue. I guess you are using v1.24 intel-device-plugins which dropped the support for creating /dev/sgx_* device links that are used by the <=v2.15 SGX SDK.
Dependency upgrades are in the plan. Will be part of next release.
@astronaut0131 This PR updates to the latest SDK and is supposed to fix your issue. If possible can you give it a try.
@avalluri Sorry for the late reply, I'm out of office last week, the latest version gives the following error:
$ kubectl logs tcs-controller-6b64fcd89-fk76q -n tcs-issuer
Defaulted container "tcs-issuer" out of: tcs-issuer, init (init)
flag provided but not defined: -use-random-nonce
Usage of /tcs-issuer:
-cert-manager-issuer
Run it as issuer for cert-manager. (default true)
-csr-full-cert-chain
Return full certificate chain in Kubernetes CSR certificate.
-health-probe-bind-address string
The address the probe endpoint binds to. (default ":8081")
-key-wrap-mechanism string
CA private key wrapping mechanism to use. One of: 'aes_gcm' or 'ads_key_pad_wrap' (default "aes_key_wrap_pad")
-kubeconfig string
Paths to a kubeconfig. Only required if out-of-cluster.
-leader-elect
Enable leader election for controller manager. Enabling this will ensure there is only one active controller manager.
-log-flush-frequency duration
Maximum number of seconds between log flushes (default 5s)
-metrics-bind-address string
The address the metric endpoint binds to. (default ":8080")
-so-pin string
PKCS11 token so/admin pin.
-token-label string
PKCS11 label to use for the operator token. (default "SgxOperator")
-user-pin string
PKCS11 token user pin.
-zap-devel
Development Mode defaults(encoder=consoleEncoder,logLevel=Debug,stackTraceLevel=Warn). Production Mode defaults(encoder=jsonEncoder,logLevel=Info,stackTraceLevel=Error)
-zap-encoder value
Zap log encoding (one of 'json' or 'console')
-zap-log-level value
Zap Level to configure the verbosity of logging. Can be one of 'debug', 'info', 'error', or any integer value > 0 which corresponds to custom debug levels of increasing verbosity
-zap-stacktrace-level value
Zap Level at and above which stacktraces are captured (one of 'info', 'error', 'panic').
-zap-time-encoding value
Zap time encoding (one of 'epoch', 'millis', 'nano', 'iso8601', 'rfc3339' or 'rfc3339nano'). Defaults to 'epoch'.
Looks like this error is related to https://github.com/intel/trusted-certificate-issuer/commit/570560f3cd655e57dee1716731bc3c67ebad688f, maybe you forget to change the yaml configurations accordingly?
@astronaut0131 Thanks for trying this out.
Looks like this error is related to https://github.com/intel/trusted-certificate-issuer/commit/570560f3cd655e57dee1716731bc3c67ebad688f, maybe you forget to change the yaml configurations accordingly?
The commit you mentioned removed the user-random-nonce
argument, which was not intentional. Hence this error. Now I fixed this in #62. Can you please try either cherry-picking the commit(s) or removing the argument in your deployment?
@astronaut0131 Thanks for trying this out.
Looks like this error is related to 570560f, maybe you forget to change the yaml configurations accordingly?
The commit you mentioned removed the
user-random-nonce
argument, which was not intentional. Hence this error. Now I fixed this in #62. Can you please try either cherry-picking the commit(s) or removing the argument in your deployment?
I've tried and it works well. I also find some tiny problems, which I make a pull request https://github.com/intel/trusted-certificate-issuer/pull/63.
I'm trying to deploy tcs-issuer in k8s cluster, but got the following error:
I think all the prerequisites are working correctlly.