kurtosis-tech / kurtosis

A platform for packaging and launching ephemeral backend stacks with a focus on approachability for the average developer.
https://docs.kurtosistech.com/
Apache License 2.0
376 stars 53 forks source link

Unable to create new enclaves in em ui after upgrading engine #1768

Open Dartoxian opened 12 months ago

Dartoxian commented 12 months ago

What's your CLI version?

0.85.18

Description & steps to reproduce

I am unable to create new enclaves in the em ui after I upgrade my local engine. These are the steps that I take:

  1. sudo apt update && sudo apt install --only-upgrade kurtosis-cli
  2. kurtosis enclave ls (just to trigger starting the engine with the new version and check it works)
  3. Open the web ui, try to create a new enclave. The run button will spin, and eventually just stop spinning. No error is shown, but no enclave is created either.

I can work around this issue by manually creating an enclave on the cli (kurtosis run github.com/kurtosis-tech/postgres-package) - this seems to wake something up in my local instance and after this the web ui can be used to create enclaves.

I run kurtosis on Ubuntu 22.

This issue was also seen on 0.85.10.

Desired behavior

After upgrading my engine I should be able to launch enclaves in the emui.

Additionally, the emui should show an error (rather than no message) when it does not create an enclave.

What is the severity of this bug?

Papercut; this bug is frustrating, but I have a workaround.

What area of the product does this pertain to?

Frontend: the Graphical User Interface, including the Enclave Manager

leoporoli commented 12 months ago

This is the error shown in the UI: Could not create an enclave, got: ConnectError: [unknown] NetworkError when attempting to fetch resource.

This is the request that didn't receive a response from the EM server

curl 'http://localhost:8081/kurtosis_enclave_manager.KurtosisEnclaveManagerServer/CreateEnclave' -X POST -H 'User-Agent: connect-es/0.13.2' -H 'Accept: */*' -H 'Accept-Language: en-GB,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Referer: http://localhost:9711/' -H 'connect-protocol-version: 1' -H 'content-type: application/json' -H 'Origin: http://localhost:9711/' -H 'Connection: keep-alive' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-site' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' --data-raw '{"enclaveName":"","apiContainerVersionTag":"","apiContainerLogLevel":"info","mode":"PRODUCTION"}'

The engine server starts creating the enclave but for some reason, it doesn't finish.

These are the logs from the engine' GRPC server:

DEBU[2023-11-14T15:30:13Z][engine_connect_server_service.go:CreateEnclave] args: enclave_name:""  api_container_version_tag:""  api_container_log_level:"info"  mode:PRODUCTION 
DEBU[2023-11-14T15:30:13Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Creating Docker network for enclave 'be040fdc5efd47d7b8de96ebd25a9dae'... 
DEBU[2023-11-14T15:30:13Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Docker network 'be040fdc5efd47d7b8de96ebd25a9dae' created successfully with ID 'b1cf10d6ef4ea4090a016215c52612df76e0b86d096010138889d8b73ae0079d' 
DEBU[2023-11-14T15:30:13Z][docker_manager.go:FetchImage] Fetching image 'alpine:3.17' with image download mode: missing 
DEBU[2023-11-14T15:30:13Z][docker_manager.go:getContainerHostConfig] Binds: [kurtosis-logs-collector-vol--be040fdc5efd47d7b8de96ebd25a9dae:/fluent-bit/etc] 
DEBU[2023-11-14T15:30:13Z][docker_manager.go:CreateAndStartContainer] Created container with ID 'e8fabbbb801e3137adb25bff3f94322745b6ea3802b997ba4530118f94a7ac75' from image 'alpine:3.17' 
DEBU[2023-11-14T15:30:14Z][fluentbit_configuration_creator.go:createFluentbitConfigFileInVolume] The Fluentbit config file with content '
[SERVICE]
    log_level debug
    http_server On
    http_listen 0.0.0.0
    http_port 9712
    storage.path /fluent-bit/etc/storage/
[INPUT]
    name forward
    listen 0.0.0.0
    port 9713
    storage.type  filesystem
[OUTPUT]
    name forward
    match *
    host 172.17.0.2
    port 9714
' was successfully added into the volume 
DEBU[2023-11-14T15:30:14Z][docker_manager.go:FetchImage] Fetching image 'fluent/fluent-bit:1.9.7' with image download mode: missing 
INFO[2023-11-14T15:30:14Z][docker_manager.go:pullImage] Pulling image 'fluent/fluent-bit:1.9.7'      
DEBU[2023-11-14T15:30:23Z][log_file_manager.go:func2] Creating log file paths...                   
DEBU[2023-11-14T15:30:23Z][log_file_manager.go:func2] Successfully created log file paths.  

I tried to understand what was happening based on the logs from the engine I shared. Something that is strange for me is that the Engine GRPC server receives the request, even when it starts creating the enclave but for some reason, it doesn’t show anything after this line: Pulling image 'fluent/fluent-bit:1.9.7' doesn’t show any error and if something fails while it’s fetching the image we should see an error (I checked in the code it should be An error occurred fetching image '%v') So, the FE calls the EM, the EM calls the Engine, and the engine starts creating the enclave but then something happens (I’m not sure where I think that probably in the EM) and the Engine doesn’t show any error, the enclave is not created, and the FE shows an error (Could not create an enclave, got: ConnectError: [unknown] NetworkError when attempting to fetch resource.) because it doesn’t get a response from the EM.

leoporoli commented 12 months ago

I create a PR for injecting the Logrus logger inside the CORS handler on the EM HTTP server. We can release this and try again to see the errors from the server side in order to have more context.

leoporoli commented 11 months ago

These are the logs after injecting Logrus in the CORS handlers

INFO[2023-11-16T12:22:41Z][cors.go:logf] Handler: Preflight request                   
INFO[2023-11-16T12:22:41Z][cors.go:logf]   Preflight response headers: map[Access-Control-Allow-Headers:[Connect-Protocol-Version, Content-Type, User-Agent] Access-Control-Allow-Methods:[POST] Access-Control-Allow-Origin:[*] Vary:[Origin Access-Control-Request-Method Access-Control-Request-Headers]] 
INFO[2023-11-16T12:22:41Z][cors.go:logf] Handler: Actual request                      
INFO[2023-11-16T12:22:41Z][cors.go:logf]   Actual response added headers: map[Access-Control-Allow-Origin:[*] Vary:[Origin]] 
DEBU[2023-11-16T12:22:41Z][engine_connect_server_service.go:CreateEnclave] args: enclave_name:"" api_container_version_tag:"" api_container_log_level:"info" mode:PRODUCTION 
DEBU[2023-11-16T12:22:41Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Creating Docker network for enclave '33fbcb1d1e974f8abe1d2d97c515b3c5'... 
DEBU[2023-11-16T12:22:42Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Docker network '33fbcb1d1e974f8abe1d2d97c515b3c5' created successfully with ID '3bfdc2964e00076d5fff48d0b79823606d149ae468813f050718d0583311d2df' 
DEBU[2023-11-16T12:22:42Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Creating logs collector for enclave '33fbcb1d1e974f8abe1d2d97c515b3c5' 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:FetchImage] Fetching image 'alpine:3.17' with image download mode: missing 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:getContainerHostConfig] Binds: [kurtosis-logs-collector-vol--33fbcb1d1e974f8abe1d2d97c515b3c5:/fluent-bit/etc] 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:CreateAndStartContainer] Created container with ID '662d4bfbbb3317ade93dc9d5b4a2eb6a399a05e6d898f2af21caa862ecfe75d9' from image 'alpine:3.17' 
DEBU[2023-11-16T12:22:42Z][fluentbit_configuration_creator.go:createFluentbitConfigFileInVolume] The Fluentbit config file with content '
[SERVICE]
    log_level debug
    http_server On
    http_listen 0.0.0.0
    http_port 9712
    storage.path /fluent-bit/etc/storage/
[INPUT]
    name forward
    listen 0.0.0.0
    port 9713
    storage.type  filesystem
[OUTPUT]
    name forward
    match *
    host 172.17.0.2
    port 9714
' was successfully added into the volume 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:FetchImage] Fetching image 'fluent/fluent-bit:1.9.7' with image download mode: missing 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:getContainerHostConfig] Binds: [kurtosis-logs-collector-vol--33fbcb1d1e974f8abe1d2d97c515b3c5:/fluent-bit/etc] 
DEBU[2023-11-16T12:22:42Z][docker_manager.go:CreateAndStartContainer] Created container with ID '27a29722ac4a6c95822f7b0f6843c90e98635c03435c40f26d934ff085512fe5' from image 'fluent/fluent-bit:1.9.7' 
DEBU[2023-11-16T12:22:43Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Checking for logs collector availability in enclave '33fbcb1d1e974f8abe1d2d97c515b3c5'... 
DEBU[2023-11-16T12:22:43Z][create_logs_collector.go:CreateLogsCollectorForEnclave] ...logs collector is available in enclave '33fbcb1d1e974f8abe1d2d97c515b3c5' 
DEBU[2023-11-16T12:22:43Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Logs collector successfully created with container ID '27a29722ac4a6c95822f7b0f6843c90e98635c03435c40f26d934ff085512fe5' for enclave '33fbcb1d1e974f8abe1d2d97c515b3c5' 
DEBU[2023-11-16T12:22:43Z][api_container_launcher.go:LaunchWithCustomVersion] Launching Kurtosis API container...          
DEBU[2023-11-16T12:22:43Z][docker_kurtosis_backend_api_container_functions.go:CreateAPIContainer] Creating the APIC for enclave '33fbcb1d1e974f8abe1d2d97c515b3c5' 
INFO[2023-11-16T12:22:43Z][docker_manager.go:pullImage] Pulling image 'kurtosistech/core:0.85.23'    
WARN[2023-11-16T12:22:51Z][docker_kurtosis_backend_api_container_functions.go:CreateAPIContainer] Failed to pull the latest version of API container image 'kurtosistech/core:0.85.23'; you may be running an out-of-date version 
DEBU[2023-11-16T12:22:51Z][docker_manager.go:FetchImage] Fetching image 'kurtosistech/core:0.85.23' with image download mode: missing 
DEBU[2023-11-16T12:22:51Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[27a29722ac4a6c95822f7b0f6843c90e98635c03435c40f26d934ff085512fe5:true] 
DEBU[2023-11-16T12:22:51Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[33fbcb1d1e974f8abe1d2d97c515b3c5:true kurtosis-logs-collector-vol--33fbcb1d1e974f8abe1d2d97c515b3c5:true] 
DEBU[2023-11-16T12:22:51Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[3bfdc2964e00076d5fff48d0b79823606d149ae468813f050718d0583311d2df:true] 
leoporoli commented 11 months ago

These are the logs after adding debug logs when "CreateEnclave" fails. We can see the cause is Context canceled. Now we have to find why the context is canceled.

ben@japes [11:34:07] [~/dev/kurtosis/kurtosis/enclave-manager/web] [bgazzard/minor-emui-feedback]
-> % docker logs -f 1bd3d9b6c350
DEBU[2023-11-17T11:34:05Z][backend_creator.go:getLocalDockerKurtosisBackend] Connecting to Docker daemon via unix socket '/var/run/docker.sock' 
DEBU[2023-11-17T11:34:05Z][log_file_manager.go:func1] Scheduling log removal for log retention every '6h0m0s' hours... 
DEBU[2023-11-17T11:34:05Z][log_file_manager.go:func2] Scheduling log file path creation every '1m0s' minutes... 
DEBU[2023-11-17T11:34:05Z][log_file_manager.go:RemoveLogsBeyondRetentionPeriod] Removed logs beyond retention period at the following path: '/var/log/kurtosis//2023//41/' 
INFO[2023-11-17T11:34:05Z][main.go:runMain] Running server...                            
INFO[2023-11-17T11:34:05Z][server.go:RunEnclaveManagerApiServer] Web server running and listening on port 8081 
INFO[2023-11-17T11:34:22Z][cors.go:logf] Handler: Preflight request                   
INFO[2023-11-17T11:34:22Z][cors.go:logf]   Preflight response headers: map[Access-Control-Allow-Headers:[Connect-Protocol-Version, Content-Type, User-Agent] Access-Control-Allow-Methods:[POST] Access-Control-Allow-Origin:[*] Vary:[Origin Access-Control-Request-Method Access-Control-Request-Headers]] 
INFO[2023-11-17T11:34:23Z][cors.go:logf] Handler: Actual request                      
INFO[2023-11-17T11:34:23Z][cors.go:logf]   Actual response added headers: map[Access-Control-Allow-Origin:[*] Vary:[Origin]] 
INFO[2023-11-17T11:34:23Z][cors.go:logf] Handler: Preflight request                   
INFO[2023-11-17T11:34:23Z][cors.go:logf]   Preflight response headers: map[Access-Control-Allow-Headers:[Connect-Protocol-Version, Content-Type, User-Agent] Access-Control-Allow-Methods:[POST] Access-Control-Allow-Origin:[*] Vary:[Origin Access-Control-Request-Method Access-Control-Request-Headers]] 
INFO[2023-11-17T11:34:23Z][cors.go:logf] Handler: Actual request                      
INFO[2023-11-17T11:34:23Z][cors.go:logf]   Actual response added headers: map[Access-Control-Allow-Origin:[*] Vary:[Origin]] 
INFO[2023-11-17T11:34:34Z][cors.go:logf] Handler: Preflight request                   
INFO[2023-11-17T11:34:34Z][cors.go:logf]   Preflight response headers: map[Access-Control-Allow-Headers:[Connect-Protocol-Version, Content-Type, User-Agent] Access-Control-Allow-Methods:[POST] Access-Control-Allow-Origin:[*] Vary:[Origin Access-Control-Request-Method Access-Control-Request-Headers]] 
INFO[2023-11-17T11:34:34Z][cors.go:logf] Handler: Actual request                      
INFO[2023-11-17T11:34:34Z][cors.go:logf]   Actual response added headers: map[Access-Control-Allow-Origin:[*] Vary:[Origin]] 
DEBU[2023-11-17T11:34:34Z][engine_connect_server_service.go:CreateEnclave] args: enclave_name:""  api_container_version_tag:""  api_container_log_level:"info"  mode:PRODUCTION 
DEBU[2023-11-17T11:34:34Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Creating Docker network for enclave '57668a5d641d4ea09f2a74f46deb1e58'... 
DEBU[2023-11-17T11:34:34Z][docker_kurtosis_backend_enclave_functions.go:CreateEnclave] Docker network '57668a5d641d4ea09f2a74f46deb1e58' created successfully with ID 'bc5b74f563768a40e909d6b574f96b4e07d2488d4849395faf689c7df2909bac' 
DEBU[2023-11-17T11:34:34Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Creating logs collector for enclave '57668a5d641d4ea09f2a74f46deb1e58' 
DEBU[2023-11-17T11:34:34Z][docker_manager.go:FetchImage] Fetching image 'alpine:3.17' with image download mode: missing 
DEBU[2023-11-17T11:34:34Z][docker_manager.go:getContainerHostConfig] Binds: [kurtosis-logs-collector-vol--57668a5d641d4ea09f2a74f46deb1e58:/fluent-bit/etc] 
DEBU[2023-11-17T11:34:34Z][docker_manager.go:CreateAndStartContainer] Created container with ID 'dab7533fd223082cddf9f97dd1c5d919cd0eb0bfef3c5e347c3fbf3337c40ed0' from image 'alpine:3.17' 
DEBU[2023-11-17T11:34:34Z][fluentbit_configuration_creator.go:createFluentbitConfigFileInVolume] The Fluentbit config file with content '
[SERVICE]
    log_level debug
    http_server On
    http_listen 0.0.0.0
    http_port 9712
    storage.path /fluent-bit/etc/storage/
[INPUT]
    name forward
    listen 0.0.0.0
    port 9713
    storage.type  filesystem
[OUTPUT]
    name forward
    match *
    host 172.17.0.2
    port 9714
' was successfully added into the volume 
DEBU[2023-11-17T11:34:34Z][docker_manager.go:FetchImage] Fetching image 'fluent/fluent-bit:1.9.7' with image download mode: missing 
DEBU[2023-11-17T11:34:34Z][docker_manager.go:getContainerHostConfig] Binds: [kurtosis-logs-collector-vol--57668a5d641d4ea09f2a74f46deb1e58:/fluent-bit/etc] 
DEBU[2023-11-17T11:34:35Z][docker_manager.go:CreateAndStartContainer] Created container with ID '41438fb3aca7e0548eb859aac718c8681522b05cdf8ec294199336da277faf1a' from image 'fluent/fluent-bit:1.9.7' 
DEBU[2023-11-17T11:34:35Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Checking for logs collector availability in enclave '57668a5d641d4ea09f2a74f46deb1e58'... 
DEBU[2023-11-17T11:34:35Z][create_logs_collector.go:CreateLogsCollectorForEnclave] ...logs collector is available in enclave '57668a5d641d4ea09f2a74f46deb1e58' 
DEBU[2023-11-17T11:34:35Z][create_logs_collector.go:CreateLogsCollectorForEnclave] Logs collector successfully created with container ID '41438fb3aca7e0548eb859aac718c8681522b05cdf8ec294199336da277faf1a' for enclave '57668a5d641d4ea09f2a74f46deb1e58' 
DEBU[2023-11-17T11:34:35Z][api_container_launcher.go:LaunchWithCustomVersion] Launching Kurtosis API container...          
DEBU[2023-11-17T11:34:35Z][docker_kurtosis_backend_api_container_functions.go:CreateAPIContainer] Creating the APIC for enclave '57668a5d641d4ea09f2a74f46deb1e58' 
INFO[2023-11-17T11:34:35Z][docker_manager.go:pullImage] Pulling image 'kurtosistech/core:0.85.30'    
WARN[2023-11-17T11:34:43Z][docker_kurtosis_backend_api_container_functions.go:CreateAPIContainer] Failed to pull the latest version of API container image 'kurtosistech/core:0.85.30'; you may be running an out-of-date version. Error:
Failed to pull Docker image 'kurtosistech/core:0.85.30' from remote image repository
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1207 (DockerManager.FetchImageIfMissing) ---
Caused by: Tried pulling image 'kurtosistech/core:0.85.30' but failed
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1386 (DockerManager.pullImage) ---
Caused by: ImagePull for 'kurtosistech/core:0.85.30' on platform '' failed with an unexpected error
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:2018 (pullImage) ---
Caused by: context canceled 
DEBU[2023-11-17T11:34:43Z][docker_manager.go:FetchImage] Fetching image 'kurtosistech/core:0.85.30' with image download mode: missing 
DEBU[2023-11-17T11:34:43Z][docker_manager.go:CreateAndStartContainer] Error occurred fetching image 'kurtosistech/core:0.85.30'. Err:
An error occurred fetching image 'kurtosistech/core:0.85.30'
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1266 (DockerManager.FetchImage) ---
Caused by: An error occurred checking for local availability of Docker image 'kurtosistech/core:0.85.30'
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1199 (DockerManager.FetchImageIfMissing) ---
Caused by: Failed to list images.
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1359 (DockerManager.isImageAvailableLocally) ---
Caused by: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/images/json?all=1&filters=%7B%22reference%22%3A%7B%22kurtosistech%2Fcore%3A0.85.30%22%3Atrue%7D%7D": context canceled 
DEBU[2023-11-17T11:34:43Z][docker_kurtosis_backend_api_container_functions.go:CreateAPIContainer] Error occurred starting the API container. Err:
An error occurred fetching image 'kurtosistech/core:0.85.30'
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:530 (DockerManager.CreateAndStartContainer) ---
Caused by: An error occurred fetching image 'kurtosistech/core:0.85.30'
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1266 (DockerManager.FetchImage) ---
Caused by: An error occurred checking for local availability of Docker image 'kurtosistech/core:0.85.30'
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1199 (DockerManager.FetchImageIfMissing) ---
Caused by: Failed to list images.
 --- at /home/circleci/project/container-engine-lib/lib/backend_impls/docker/docker_manager/docker_manager.go:1359 (DockerManager.isImageAvailableLocally) ---
Caused by: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/images/json?all=1&filters=%7B%22reference%22%3A%7B%22kurtosistech%2Fcore%3A0.85.30%22%3Atrue%7D%7D": context canceled 
DEBU[2023-11-17T11:34:43Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[41438fb3aca7e0548eb859aac718c8681522b05cdf8ec294199336da277faf1a:true] 
DEBU[2023-11-17T11:34:43Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[57668a5d641d4ea09f2a74f46deb1e58:true kurtosis-logs-collector-vol--57668a5d641d4ea09f2a74f46deb1e58:true] 
DEBU[2023-11-17T11:34:43Z][docker_object_parallelizer.go:RunDockerOperationInParallel] Called RunDockerOperationInParallel on the following Docker object IDs: map[bc5b74f563768a40e909d6b574f96b4e07d2488d4849395faf689c7df2909bac:true] 
leoporoli commented 11 months ago

The issue is happening only in Firefox Ubuntu

leoporoli commented 11 months ago

I've added a Notion document which describes what was tested and some video links with the debug process: https://www.notion.so/kurtosistech/BUG-CreateEnclave-fails-on-Firefox-981b4062c8f54ac6a216b9a18d15a187

Dartoxian commented 11 months ago

Possibly related to https://github.com/connectrpc/connect-es/issues/907