microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.02k stars 399 forks source link

Error creating Docker containers on VMs: Failed to unmarshall layerchain json - invalid character '\\x00' #1146

Open jmkeefer opened 3 years ago

jmkeefer commented 3 years ago

Service Fabric Runtime Versions: 7.2.432.9590 7.2.433.9590

Environment: Azure

Description: We are building Windows Docker images (via Docker Compose) in our Azure DevOps build pipelines (windows-2019) and publishing them to a private Azure Container Repository. We then deploy the images to Azure Service Fabric in our deploy pipeline. The Service Fabric VMs are imaged with WindowsServer datacenter-core-1909-with-containers OS.

Observed behavior: After creating a brand new Service Fabric environment, Service Fabric is encountering errors when attempting to create the containers on the VMs for our application during deployment:

DockerRequest returned StatusCode=InternalServerError with ResponseBody={"message":"Failed to unmarshall layerchain json - invalid character '\\x00' looking for beginning of value"}

Expected Behavior: This should work. We have tested this same image in Service Fabric without issue in the past. We have also pulled the images and run the containers locally without issue. I have found several references to invalid character '\x00' in regards to corrupt Docker logs, but very few regarding the layerchain.json. The only issues I have ever found have never had any actual resolutions identified or communicated from what I can tell.

Doing some digging, I can see in the C:\ProgramData\Docker\WindowsFilter directory that there are quite a few subdirectories, each of which contain layerchain.json files. Viewing these files, I can see that many appear to have valid layerchain arrays, while others have strange whitespace. It is an educated guess that the files with the strange whitespace are causing the issue, but if this is the issue, I have no idea why it is happening, how to fix it, or how to prevent it from happening each time we deploy.

What we have tried so far, with no luck:

OS(Windows/Linux): Windows

If this is a regression, which version did it regress from? This image has worked in SF in the past, though that service fabric instance started having issues as well when the VM scale sets were stopped and restarted.


Assignees: /cc @microsoft/service-fabric-triage

jmkeefer commented 3 years ago

UPDATE: I have tried creating images based on ubuntu-18.04 and running them in a Virtual Machine Scale Set imaged with ubuntu-18.04 OS, and this is working without issue.

However, we still have the error mentioned above with the layerchain.json in the Windows images, so I would say this issue is not yet resolved.