newrelic / nr1-slo-r

NR1 SLO-R allows you to define, calculate and report on service-level objective (SLO) attainment.
https://discuss.newrelic.com/t/track-your-service-level-objectives-with-the-slo-r-nerdpack/90046
Apache License 2.0
21 stars 21 forks source link

"View SLOs" fails to load after updating a particular SLO #119

Open ghost opened 3 years ago

ghost commented 3 years ago

Description

SLO view fails to load (both grid and table). Combine SLOs is able to load all SLOs except one. The SLO in question (AWD Online) was modified to add additional transactions and this appears to have caused the issue for multiple (probably all) users in the account.

I believe this SLO failing to load is causing a rendering failure for all SLOs on the page, but I don't think we have access to remove it from the NerdStore.

Steps to Reproduce

Navigate to Table view. Table headers will load, but with no content.

Expected Behaviour

Combined/Grid/List view to load properly and display SLOs.

Relevant Logs / Console output

image

one.newrelic.com-1599744613684.log

Your Environment

Additional context

ghost commented 3 years ago

I opened request #426123 with support as well, to see if they can copy the definition of the offending SLO and remove it.

forestb commented 3 years ago

Hi - I'm new to SLO/R but am experiencing near-identical issues. See image below:

image

This is after defining new SLO's for any type except "Errors". I'm new to this but have double and triple checked that I've followed the correct setup instructions. I'd be happy to provide more info if it would help, and/or open an additional Github Issue; but it seems to me that this implies SLO/R is simply not functional (aside from "Error" type SLOs). If you could let me know what the best way to proceed is, I'd appreciate it!

For the meantime, SLO/R appears to be the perfect (albeit currently unusable) solution to what I'm trying to accomplish.

ghost commented 3 years ago

@forestb What does your table view look like? What errors are appearing in your console?

forestb commented 3 years ago

@forestb What does your table view look like?

image

The table view is empty.

What errors are appearing in your console?

image

Is this helpful?

forestb commented 3 years ago

@andrewseling , regarding:

I opened request #426123 with support as well, to see if they can copy the definition of the offending SLO and remove it.

Were you able to get someone to remove the definition? I've tried removing the SLO/R App "altogether" but I suspect the data is persisting in New Relic - probably within Insights somewhere. When I "reinstall" or "re-add" it, everything comes right back - including the errors.

ghost commented 3 years ago

There is a NerdGraph API that should allow access to remove the definition, but we weren't able to get it to work. The case is still open.

Have you checked the webhook setup for your alert-based SLOs. I haven't used them, but the lack of anything on the table page makes me think there is an issue when some SLOs fail to load, regardless of cause, that prevents the component from loading in properly.

forestb commented 3 years ago

I haven't used them, but...

This is helpful; if you're not using them, then I should stop hijacking your issue and reach out to support.

Have you checked the webhook setup for your alert-based SLOs. I haven't used them, but the lack of anything on the table page makes me think there is an issue when some SLOs fail to load, regardless of cause, that prevents the component from loading in properly.

I've tripled checked the configuration. What's interesting is that when I press, "Send a test notification", my SLO configuration picks up on that payload and I am able to properly display an SLO based on that. (The data is meaningless, of course...)

image

So there must be something about the payload from my "actual alert" that results in something which breaks the displaying/reporting of an SLO (e.g. my dashboard is blank with javascript errors in my console)...

forestb commented 3 years ago

Ugh. Update: I've figured out the cause of my problem is that my policy name had an apostrophe in it, and that appears to completely break the SLO/R plugin. (My policy was named Forest's test policy.)

I still have the same problem where now I have orphaned SLO/R definitions that I can't access, so I'll poke around the API a bit and see if I can figure something out. I'll keep an eye out here in case you figure that bit out, too.

ghost commented 3 years ago

Keagan Peet from NR removed the offending SLO definition from our Nerdstore and everything is working normally again. I'll add "reproduce bug on my local development copy" to my TODO list so we can understand what about that particular SLO definition causes issues - my guess is length.