Open vdvarlamov opened 5 years ago
@dkkapur @athinanthny @masnider any idea on this one? Is there something we might be missing to get this to show up for @vdvarlamov ?
Hi @vdvarlamov,
You probably don't see Load Information when deploying to Local Cluster because this information is sent to Cluster Manager periodically by an agent (the default it is 5 mins).
You can change the interval in ClusterManifest.xml
inside ReconfigurationAgent
section:
<Section Name="ReconfigurationAgent">
<Parameter Name="SendLoadReportInterval" Value="60" />
</Section>
Please see this answer on Stack Overflow for more details.
Here is what I can see for an empty stateless service:
Hope this helps.
I added a parameter "SendLoadReportInterval", but no result. There is always a warning: "GetResourceUsageAsyncOperation Operation returned SerializationError" Any idea on this one?
Hi @vdvarlamov,
I did a small walk through the Service Fabric source code and here is what I think is happening.
The error you've mentioned:
"GetResourceUsageAsyncOperation Operation returned SerializationError"
... comes from the ProcessActivationManager::ProcessGetResourceUsage
method as the result of GetResourceUsageAsyncOperation
asynchronous operation (here is where it starts and here is where the error message is).
Unwinding the call chain from ProcessActivationManager::ProcessGetResourceUsage
leads us to GetResourceUsageAsyncOperation::OnStart then to ApplicationService::BeginMeasureResourceUsage and then to MeasureResourceUsageAsyncOperation::OnStart.
Inside the MeasureResourceUsageAsyncOperation::OnStart
we can see what API call Service Fabric makes to container host:
if (owner_.IsContainerHost)
{
auto operation = owner_.ActivationManager.containerActivator_->BeginInvokeContainerApi(
owner_.ContainerDescriptionObj,
L"GET",
L"/containers/{id}/stats?stream=false",
L"application/json",
L"",
HostingConfig::GetConfig().ContainerStatsTimeout,
[this](AsyncOperationSPtr const & operation)
{
this->OnContainerApiStatsCompleted(operation, false);
},
thisSPtr);
this->OnContainerApiStatsCompleted(operation, true);
}
The OnContainerApiStatsCompleted
method handles the operation completion:
if (!error.IsSuccess())
{
// ...
}
else
{
ContainerApiResponse containerApiResponse;
error = JsonHelper::Deserialize(containerApiResponse, result);
if (error.IsSuccess())
{
ContainerApiResult const & containerApiResult = containerApiResponse.Result();
if (containerApiResult.Status() == 200)
{
ContainerStatsResponse containerStatsResponse;
error = JsonHelper::Deserialize(containerStatsResponse, containerApiResult.Body());
if (error.IsSuccess())
{
resourceMeasurement_.MemoryUsage = containerStatsResponse.MemoryStats_.MemoryUsage_;
resourceMeasurement_.TotalCpuTime = containerStatsResponse.CpuStats_.CpuUsage_.TotalUsage_;
resourceMeasurement_.TimeRead = containerStatsResponse.Read_;
}
else
{
WriteWarning(
TraceType_ActivationManager,
owner_.parentId_,
"Application Service with service Id {0} ContainerStatsResponse error {1}",
owner_.appServiceId_,
error);
}
}
else
{
// ...
}
}
else
{
// ...
}
TryComplete(operation->Parent, error);
return;
}
In the email thread you've mentioned one more error:
"Application Service with service Id 0bfceb73-ab70-4f11-9b3d-023249c3ff40 ContainerStatsResponse error SerializationError"
Which I think is the key to what is happening. In the code above you can see that this kind of error message is printed only when API call to container host has succeeded but API response can't be deserialized.
I think the problem might be in version incompatibility between Docker (you have on your machine) and version of Service Fabric Cluster.
Can you try to install latest of both of them?
I updated all components. but the errors persisted!
Hi @vdvarlamov,
I have also tried to manipulate / reinstall / etc. but the issue still persisted. It looks like a bug in Service Fabric serialization contracts.
@MicahMcKittrick-MSFT @dkkapur @athinanthny @masnider can you please help with this one?
Making it simple the major issue is that Load Information isn't displayed because container host always reports the following errors:
"Application Service with service Id 0bfceb73-ab70-4f11-9b3d-023249c3ff40 ContainerStatsResponse error SerializationError"
"GetResourceUsageAsyncOperation Operation returned SerializationError"
I have done a small investigation (you can see my comment above) but this just confirmed that the problem is in deserialization of container host response.
Thanks for that. I will start an offline thread to see if I can get someone to look into it
Just FYI, engineers are engaged in the offline thread.
The SerializationError is because docker change the DateTime format from end with ‘Z’ to end with timezone.
Old: "read": "2015-01-08T22:57:31.547920715Z" still show from docker docs https://docs.docker.com/engine/api/v1.40/#operation/ContainerStats New: "read": "2019-10-18T17:44:36.5599007-07:00"
Our TryParse will return false because last char is not ‘Z’. //to support docker format //2018-02-23T11:22:12.1630849Z if (str.size() >= 24) { if (str[10] != L'T' || str[str.size() - 1] != L'Z') { return false; } }
They are working out the best way to correct this.
Hi @MicahMcKittrick-MSFT ,
Just wanted to confirm we have the same issue, resulting in autoscaling of services not working. Eventlogs are riddled with ContainerStatsResponse error SerializationError
events.
Any workaround until fix is in place would be appreciated.
Edit: Forgot to mention, this is a Azure Service Fabric cluster, not on premise. Runs on windows 2019-1809
Versions: Service Fabric: 6.5.664.9590 Docker: 19.03.2, build c92ab06ed9
Moby team and Docker team, believe that everything is fine. https://github.com/moby/moby/issues/40975
They are working out the best way to correct this.
I have standalone cluster 6.5.664.9590 on Windows Server 2019. In ClusterManifest Enabled ResourceMonitorService
<Section Name="ResourceMonitorService">
<Parameter Name="InstanceCount" Value="-1" />
<Parameter Name="IsEnabled" Value="True" />
</Section>
Stateless Service run in Win docker container, as ExclusiveProcess. I need it because it does not work AutoScaling Policies. For example, view Load Information(this is not my screenshot):SF config and app package: https://github.com/vdvarlamov/SF-test
I noticed Warning: "GetResourceUsageAsyncOperation Operation returned SerializationError" and for every containers "Application Service with service Id 0bfceb73-ab70-4f11-9b3d-023249c3ff40 ContainerStatsResponse error SerializationError" in Event Viewer Microsoft-Service Fabric/Admin