microsoft / service-fabric-issues

This repo is for the reporting of issues found with Azure Service Fabric.
168 stars 21 forks source link

Question: Is ApplicationInsights recommended or not for service fabric? #157

Closed qrli closed 7 years ago

qrli commented 7 years ago

I have been investigating logging and monitoring solution for a while. We are not happy with the OMS integration solution. While I also have concerns on ApplicationInsights, but it looks like the best solution (which looks on par with or even better than Google Cloud's or ELK stack) easily approachable for us. And it is the integrated-by-default solution for many other Azure services we use. However, there is almost no doc on how to integrate service fabric with AI. And there is discontinued nuget package from google result.

So it feels that you do not recommend using AI with SF. So I wonder, is it true? And why?

SF doc very briefly mentioned Azure Diagnostics can be configured to forward all data to Azure storage and AI. But no mention about how to forward to AI. I found this page but it talks about Cloud Services.

I also found stackoverflow question on this. And it looks like there is only the workaround of taking some sample code, which essentially create event listener to write to AI. It is doable, but not so manageable for large applications with many many services. And most importantly, it feels like a risky bet for us.

ghost commented 7 years ago

The article is just in the process of going live so it's not yet tied to the navigation bar, please take a look at this articl and let me know if it answers your questions https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-overview

qrli commented 7 years ago

@toddabel Great article with a lot of good info. The picture is now clearer to me. But I still have some questions:

  1. For Azure Diagnostics to Application Insights, the linked page is still the old page which talks about the classic Cloud Service. I only find a link to a powershell solution which is on github, which contains a script to configure it to forward to AI from a general VM. I'm no expert on AD, so I'm still in the cloud. I understand the AD solution has some limitation. But it still looks interesting because 1) it includes system level data; 2) and we don't need to change our (many) services.

  2. For the EventFlow solution. If I understand correctly, it is not so different from using AI SDK directly. And it does not support stacks other than .NET. So, it looks to me that we are better to use ApplicationInsight packages for ASP.NET Core and Node.js directly. And there is also NLog/Serilog output plugin for AI, for non-web services. Am I right?

qrli commented 7 years ago

A bit more on in-process case:

ghost commented 7 years ago

I'll see if I can get a better article for your first question above. There is a link from Azure Diagnostics to OMS, which is very good at monitoring infrastructure level information. For point 2, it's not that different today except that it provides an easy way to change configuration and support multiple outputs, not just AI. You are correct about trying to collect node level information from within each process, it's not the correct approach and would often require admin level access to collect counters -- you should never run your service as an admin. I still recommend Azure Diagnostics and OMS if needed.

qrli commented 7 years ago

@toddabel Good point.

Regarding OMS, we have two difficulties which drove us to consider alternatives: 1) There is about 15 minutes delay to see logs; 2) We have Node.js service which cannot emit ETW. OMS doc talks about using OMS agent to gather log files, while Service Fabric uses Azure Diagnostics agent. I googled and found Azure Diagnostics can also gather log files, but I cannot figure out how to get them into OMS in the end.

richrundmsft commented 7 years ago

For 1) - if you're using the ability for Log Analytics to read the Service Fabric logs from storage, then yes, a delay of about 15 minutes seems right, since we are polling the storage account every 10 mins looking for logs.

For 2) - OMS can collect VM / VMSS diagnostics that are generated by the resource provider or from the Azure monitoring agent when they are written to storage. Details for both of these are https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-azure-storage.

qrli commented 7 years ago

@richrundmsft For 2), do you mean we use both Azure Diagnostics and OMS agent (Log Analytics VM Extenstion) on our Service Fabric VMs? They are overlapping, so what about the duplicated data? Besides the extra network cost and setup cost, will OMS handle the duplicated data gracefully?

MedAnd commented 7 years ago

This approach might also interest you: Application Insights & Semantic Logging for Service Fabric Microservices

qrli commented 7 years ago

To summarize my conclusion a bit: 1) In a larger picture, OMS meets ops needs much better, while ApplicationInsights targets devs. Although AI has some overlap with OMS, it is never meant to replace OMS; it is more of a convenient solution for devs. So, I think the choice of OMS for Service Fabric is somehow reasonable.

2) However, the current OMS integration solution is not satisfying.

3) ApplicationInsights looks like a good candidate to solve above issues, which means to use a separate tool for devs. This differs from our DevOps goal, but it is still acceptable choice, as no single tool does all the best. Alternatively, Azure provides ElasticsSearch/Kibana stack templates, which involves more setup work but more flexible.

qrli commented 7 years ago

Got some info about future of OMS and AI (under NDA so I cannot tell here), which assures a better integration.

homelchenko commented 7 years ago

I'll see if I can get a better article for your first question above

@toddabel Do you happen to have found the link? I am also interested in getting AD data into Application Insights, and seem to be unable to find a good source to explain how I can achieve that

duongthaiha commented 7 years ago

Hi I am interested in that as well any finding or guidance on this topic please

ohadschn commented 7 years ago

This seems to be the most up to date resource: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-event-analysis-appinsights.