Open Pashaminkovsky opened 2 months ago
Spent the afternoon looking into this and collected my observations on the notion document: https://www.notion.so/pixiebrix/Nunjucks-Error-Spike-0283abe59c6a48ad96f5f84715516ec3?showMoveTo=true&saveParent=true
I have not been able to identify root cause, but I propose creating a PR tomorrow that
More involved solution/future work would be to come up with a different solution for rendering nunjucks templates, an approach I have not researched and have little context on atm
@mnholtz that sounds good I appreciate you looking into it. Re logging – is it possible to have visibility into the success rate of the mod run when the error is thrown?
that sounds good I appreciate you looking into it. Re logging – is it possible to have visibility into the success rate of the mod run when the error is thrown?
Discussed over zoom - the answer is that how much of the mod is able to be run/will run on a simple page refresh/button click/etc. is determined on a per-mod basis based on the mod definition. Generally speaking this error is very disruptive to mod execution and would usually result in the user modifying their workflow in some way to overcome the issue
Tentatively putting this back into the backlog until we can get more logging visibility/information from https://github.com/pixiebrix/pixiebrix-extension/pull/9168
It looks like increasing the timeout to 5s did not make any impact on the errors: https://app.datadoghq.com/logs?query=issue.id%3A6f1c3b50-4bb1-11ef-a033-da7ad0900002%20%2A%3A%22%22%20&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice%2C%40error.causes%5B1%5D&fromUser=true&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=service%2Casc&viz=stream&from_ts=1726583812591&to_ts=1727188612591&live=true
And the cause is still mixed; doesn't seem to be an issue with the iframe getting injected onto the page (not a single one of these errors was caused by the new IframeInjectionError), but loading.
After further investigation with @grahamlangford, we've identified the following additional hypotheses (both of which are documented on the original notion document):
Following this, we have identified the following next steps for a follow-up PR:
Adding re-injection in https://github.com/pixiebrix/pixiebrix-extension/pull/9208 did not seem to have an effect.
Additional information added to error telemetry did not come through to datadog due to the way datadog selects error properties to report (oversight on my part). Will be fixing this in a follow-up commit.
One avenue for us to pursue is to move the sandbox to the offscreen page to avoid issues related to injecting the iframe during brick execution runtime.
POC: https://github.com/pixiebrix/pixiebrix-extension/pull/9352
Another avenue that requires less of an architecture shift is to pre-emptively inject the sandbox onto the page upon content script initialization rather than waiting until the point where a brick requiring it is executing
I merged in some pre-work in the above PR which will make it easier to extend the offscreen page to inject sandbox execution.
Pausing on this for now until we see how the current release does with render_nunjucks errors
Before you moved to the offscreen document, have you seen any initialization errors in the sidebar as well? During the chrome.sidePanel
migration we attributed those errors to extension reloads but I don't know if they also happened on production.
Before you moved to the offscreen document, have you seen any initialization errors in the sidebar as well? During the
chrome.sidePanel
migration we attributed those errors to extension reloads but I don't know if they also happened on production.
I haven't seen anything in our logging, and we haven't had any issues reported in a long time
Spike details: https://www.notion.so/pixiebrix/Nunjucks-Error-Spike-0283abe59c6a48ad96f5f84715516ec3?showMoveTo=true&saveParent=true
Datadog dashboards:
test edit