Open aliazamrana opened 6 years ago
@aliazamrana You will need to take memory dump of your app and see which objects are taking the memory. Lets start investigation from there.
@ashishnegi I can take it but the process is taking huge amounts of RAM and it stucks in between the dumps I can get a mini dump though via process explorer
@aliazamrana Lets start with that. Try attaching the process in Visual Studio or similar tools Or take dumps and analyze in similar tools to see which objects are taking most of memory.
@ashishnegi This is what I get but my service's purpose is to handle webrequests although this is after I tried to update my controllers to overcome any strong references
[ResponseCache(Location = ResponseCacheLocation.None, NoStore = true)]
[HttpGet]
public async Task<string> Get()
{
string endPoint="";
//var fabricClient = new FabricClient();
using (var fabricClient = new FabricClient())
{
var apps = await fabricClient.QueryManager.GetApplicationListAsync();
foreach (var app in apps)
{
System.Diagnostics.Debug.WriteLine($"Discovered application:'{app.ApplicationName}");
var services = await fabricClient.QueryManager.GetServiceListAsync(app.ApplicationName);
foreach (var service in services)
{
System.Diagnostics.Debug.WriteLine($"Discovered Service:'{service.ServiceName}");
var partitions = await fabricClient.QueryManager.GetPartitionListAsync(service.ServiceName);
if (service.ServiceKind != System.Fabric.Query.ServiceKind.Stateful)
{
continue;
}
else if (!service.ServiceTypeName.Contains("TradingController"))
{
continue;
}
foreach (var partition in partitions)
{
System.Diagnostics.Debug.WriteLine($"Discovered Service Partition:'{partition.PartitionInformation.Kind} {partition.PartitionInformation.Id}");
ServicePartitionKey key = new ServicePartitionKey();
switch (partition.PartitionInformation.Kind)
{
case ServicePartitionKind.Singleton:
key = ServicePartitionKey.Singleton;
break;
case ServicePartitionKind.Int64Range:
var longKey = (Int64RangePartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(longKey.LowKey);
break;
case ServicePartitionKind.Named:
var namedKey = (NamedPartitionInformation)partition.PartitionInformation;
key = new ServicePartitionKey(namedKey.Name);
break;
default:
break;
//throw new ArgumentOutOfRangeException("partition.PartitionInformation.Kind");
}
var resolver = new ServicePartitionResolver();
var resolved = await resolver.ResolveAsync(service.ServiceName, key, CancellationToken.None);
//foreach (var endpoint in resolved.Endpoints)
//{
// System.Diagnostics.Debug.WriteLine($"Discovered Service Endpoint:'{endpoint.Address}");
//}
if (service.ServiceKind == System.Fabric.Query.ServiceKind.Stateful)
{
endPoint = resolved.Endpoints.FirstOrDefault().Address;
}
}
partitions = null;
}
services = null;
}
apps = null;
}
int index = endPoint.IndexOf("http");
string url = endPoint.Substring(index);
var charsToRemove = new string[] { "}", "\\", "\"" };
foreach (var c in charsToRemove)
{
url = url.Replace(c, string.Empty);
}
charsToRemove = null;
GC.Collect();
return url;
}
And this is the post controller part I guess this also affected somehow
[ResponseCache(Location = ResponseCacheLocation.None, NoStore = true)]
[Route("Redirect/{*endurl}")]
[HttpPost]
public async Task<JsonResult> Post(string endurl,[FromBody] JToken mT4Result)
{
var urlController = new URLFinderController();
string url = await urlController.Get();
urlController = null;
url += "/" + endurl;
JObject jObject = JObject.FromObject(mT4Result);
var client = new RestClient(url);
//var request = new RestRequest(endurl, Method.POST);
var request = new RestRequest(Method.POST); //RestRequest(Method.POST);
//request.Timeout =
request.RequestFormat = DataFormat.Json;
request.AddBody(jObject.ToString(Newtonsoft.Json.Formatting.None));
var reply = client.ExecuteAsPost(request, "Post");
//request.AddBody(jObject);
client = null;
request = null;
var content = reply.Content.Clone();
reply = null;
GC.Collect();
return new JsonResult(content);
}
although this is after I tried to update my controllers to overcome any strong references
@aliazamrana If I understand the memory snapshot correctly, these objects are taking memory only in ~4 MBs. Do you mean that after you took care of strong references, memory usage decreased ? Is this correct dump of process that is taking 2 GB RAM ? It is still not clear which objects are taking memory ?
You can try profiling with and without Service Fabric code to eliminate other issues. Let the service reach GB's of memory before taking dump.
One optimizaton : You should not create using (var fabricClient = new FabricClient())
again. Just store this in some static variable.
@ashishnegi This is the snapshot after 1 night of updated code and after the cluster deployment when the services were running and everything started the memory consumed by this service was 125-180 MBs which I guess is normal but after just 14 Hours the memory consumption was 775 MB and this snapshot was of that time
Sorted on Size
Sorted on Inclusive Size
Sorted on Counts
And about this
One optimizaton : You should not create using (var fabricClient = new FabricClient()) again. Just store this in some static variable.
I tried that but somehow this using block was more efficient and was using a little less memory when I was debugging on local cluster so I went with that approach although I am not sure why are you suggesting it is not a good approach because if I am not wrong this way it will not have a strong reference to the object and at times if a service goes down and the static object is created again I am not sure if Garbage Collector will be able to remove that strong reference, anyway I will try with your suggestion for another day to see what happens
To me, these numbers are still less than what I expect. Can you put constraint on the app to use only 400 MB of RAM and just increase the load till it reaches its limit ? If it recovers memory, then it means that it is just GC long lived objects. If not, then yes someone is holding that memory.
If not managed, this might be a leak in native code. Can you look on memory analysis of native code ? Or finding what %age of memory is managed and native and how it changes gradually will help as well.
Ok I will try to limit the usage but for now the process seems to stick to the 780 MBs
@ashishnegi This is the 2.91 GB dump file for the service process and it still shows very little memory not sure why is it taking this much size
@aliazamrana Lets try a native (c++) dump analyzer to see similar snapshot for native objects.
Hi @aliazamrana, out of interest are you using server or desktop gc?
@MedAnd Using a server
@aliazamrana - having experienced similar issues you might want to try workstation garbage collection (which is the default):
<gcServer enabled="false"/>
@MedAnd I am currently on developing the web services on service fabric so is it something you enable on development side or is it done through azure management portal when configuring the cluster because I may need to forward this report to particular person
@aliazamrana - this is done in your project, for example if your App.config looks something like:
try setting:
<gcServer enabled="false"/>
@MedAnd it is a .NET Core web service and I dont have any config files available although it has json configurations and I have tried to convert this to json and placed it in my appsettings.json file to see if this works
@aliazamrana - for .Net Core runtimeOptions maybe try:
{ "configProperties": { "System.GC.Server": false } }
`
`
I have a 5 node cluster and I have a .NET API stateful service that is being consumed by my dashboard and multiple guest executables. To get the url of that service I have a statelss service that is running on all 5 nodes and it is also a .NET API that's sole purpose is to get the url for the stateful service. Now after only 3-4 days of running my stateless service for the getting the url is using huge amounts of RAM on the nodes and causing issues with guest executables to run properly. It is taking more than 2600 MB on each node.
I am not sure why the service is taking this much memory the api controller I am using in the service is this
Now this controller is being hit continuously and I tried adding no caching switch to test if the issue was related to too many requests being cached but its not still high memory consumption
I hope this issue gets answered and not just left over like many others I posted.