newrelic / video-agent-roku

New Relic Agent for Roku
Apache License 2.0
18 stars 18 forks source link

Option to use custom timestamp if provided to nrSendHTTPRequest and nrSendHTTPResponse #58

Closed d-javed closed 1 year ago

d-javed commented 1 year ago

Problem The client application is not able to send a custom timestamp of the captured http events if sent to the New Relic agent at a later stage. Use cases such as holding on to these events and sending them later provides inaccurate timestamps and timeSinceHttpRequest since we can defer the request and send them at the same time as the response, Another use is deferring these events during the app boot process or conditionally holding and dropping specific http events when sampling.

Solution Added the option to check if timestamp property is included within the attr and to simply use this, otherwise, use the nrTimestamp within the agent.

CLAassistant commented 1 year ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

asllop commented 1 year ago

@d-javed what prevents the customer from calling the nrSendHttpRequest/Response methods in the moment the events happen?

d-javed commented 1 year ago

@d-javed what prevents the customer from calling the nrSendHttpRequest/Response methods in the moment the events happen?

The client is able to call those methods as soon as the events occur. The issues occurs when the event needs to be queued locally within client for the use cases I mentioned in the description. Such as, deferring events occurring before initialisation of the New Relic agent, conditionally holding on to requests until its pair response has returned and sending that to the agent after if we want to report that specific http error. Sending them together can give inaccurate timeSinceHttpRequest.

This may not be popular but allowing the client to set the TRUE timestamp can lead to flexibility on how client uses these methods for edge case scenarios.

asllop commented 1 year ago

I understand the customer specific problem, but if we allow setting the timestamp for nrSendHttp methods, we should do it for all nrSend methods, I don't see why the HTTP event senders should be different than others. Also, I don't see a technical reason that makes impossible to initialize the NR agent at the very beginning, right before any HTTP request happens (and if there is any, that should be the issue to solve). Also, in my opinion, "conditionally holding on to requests until its pair response has returned" is a bad practice, in general deferring any event is not a good practice, events should be sent when they happen, some events get attributes from the environment, like system/OS states, and these values change over time, and so deferring events can give us a falsified picture of the system, showing a state that doesn't correspond to reality. In summary, this PR looks like something that solves a specific problem derived from some dubious practices, and definitively something that could be solved in a different way. I don't think Roku Agent users would benefit from these changes.

d-javed commented 1 year ago

I understand the customer specific problem, but if we allow setting the timestamp for nrSendHttp methods, we should do it for all nrSend methods, I don't see why the HTTP event senders should be different than others. Also, I don't see a technical reason that makes impossible to initialize the NR agent at the very beginning, right before any HTTP request happens (and if there is any, that should be the issue to solve). Also, in my opinion, "conditionally holding on to requests until its pair response has returned" is a bad practice, in general deferring any event is not a good practice, events should be sent when they happen, some events get attributes from the environment, like system/OS states, and these values change over time, and so deferring events can give us a falsified picture of the system, showing a state that doesn't exist anymore. In summary, this PR looks like something that solves a specific problem derived from some dubious practices, and definitively something that could be solved in a different way. I don't think Roku Agent users would benefit from this changes.

I would suggest going ahead to allow these methods accept custom timestamps if sent. The fact that we have the ability to create customEvents gives the client the benefit to allow additional fields anyway but nrHttp methods have fields calculating the delta between the two request and response so there is a limitation here. It does not sound correct to send a http request event of a response we would not want to report after. This causes unnecessary events be processed therefore I would not jump the gun to easily say its bad practice to defer events being processed. Moreover, such large complex systems require other dependencies to send events to NR regardless of the agents initialisation at the beginning of the app launch. The client is not either forced to add a timestamp and can get a timestamp marked on the agent itself.

asllop commented 1 year ago

Of course I don't know the internals of this specific application and how hard it would be to deal with dependencies, etc. But I don't think this agent should provide features that encourage sending data that is potentially corrupted. This agent is open source and so you can fork it and maintain your own version with the modifications you require. Or you can even replicate the behavior of the nrSendHttp methods and implement your own policies for it, using calls to nrSendCustomEvent. But I'm not accepting this PR and making it available for everyone because I don't think is either of general interest or solves an actual flaw of the agent.

luis-soares-sky commented 1 year ago

@d-javed what prevents the customer from calling the nrSendHttpRequest/Response methods in the moment the events happen?

@asilop the best example we've had happen in our app: having HTTP requests happen on the main thread while the user is in screensaver after pausing playback. While in screensaver, all SceneGraph threads will get suspended, and any attempt at rendezvous will result in a locked thread. This means we cannot report to NR Agent in a timely manner, because it's a SceneGraph component, and doing so will lock the main thread. This prevents us from sending streaming heartbeats, which leads to errors once the user returns and resumes playback. To work around this, we've had to buffer our calls to New Relic in these scenarios in order to keep the main thread running and making HTTP calls over time, and then release those NR calls once the app is resumed. However, that causes requests with a timeSinceHttpRequest of 0.

Another scenario we've been tackling is the blocking that is caused by callFunc. Because we have an architecture that is not render-thread centric (meaning, we use the render-thread mostly for rendering purposes, and take advantage of the main-thread and tasks for other business logic, which results in an app that renders quite smoothly), this means our calls to NR Agent cause rendezvous. And even in cases when they won't (when the calling thread and the NRAgent thread are the same), we are still incurring frame drops due to callFunc which is a synchronous interface with some amount of overhead. This is not ideal for performance, and the only way to improve this is to defer our calls to NR, to later execute them outside the critical path. This is already fine for any standard custom event (where we are not worried about timestamps and provide our own deltas, and custom timestamps if needed), but we are stuck when it comes to HTTP events since the delta is calculated in the agent, with no way to provide an accurate value from the outside.


We will have to move forward with deferring HTTP events in our app regardless, since these rendezvous are impacting our app's performance. But we still want to avoid the negative impact of having all timeSinceHttpRequest values set to zero. So if modifying the agent timestamp is not an acceptable solution, could we then suggest an alternate approach (probably in a separate PR) and allow something like this instead?

' In components/NRAgent.brs, we'd modify the logic to accept a custom delta...

function nrSendHttpResponse(attr as Object) as Void
    (...)
    if m.nrRequestIdentifiers[transId] <> invalid
        deltaMs = attr["timeSinceHttpRequest"]
        if deltaMs = invalid
            deltaMs = nrTimestamp() - m.nrRequestIdentifiers[transId]
            attr["timeSinceHttpRequest"] = deltaMs
        end if
        m.nrRequestIdentifiers.Delete(transId)
        'Generate metrics
        nrSendMetric("roku.http.response.time", deltaMs, {"domain": domain})
    end if

    (...)
end function

' In source/NewRelicAgent.brs, we'd allow passing overrides...

function nrSendHttpResponse(nr as Object, _url as String, msg as Object, overrides = invalid as Object) as Void
    if type(msg) = "roUrlEvent"
        attr = {}
        if overrides <> invalid
            attr.Append(overrides)
        end if

        attr.AddReplace("origUrl", _url)
        attr.AddReplace("httpCode", msg.GetResponseCode())
        attr.AddReplace("httpResult", msg.GetFailureReason())
        attr.AddReplace("transferIdentity", msg.GetSourceIdentity())
        (...)
    end if
end function

' Then, in client, all we need is to provide said delta via overrides...

nrSendHttpResponse(m.nrAgent, payload.url, payload.urlEvent, { "timeSinceHttpRequest": payload.deltaTime })

This would avoid transforming the internal NR timestamp, but would allow for the clients to provide proper deltas in timeSinceHttpRequest, since we do already calculate them for internal diagnostics and business logic anyway.

asllop commented 1 year ago

@luis-soares-sky thank you for your detailed explanation, now I understand better the nature of the problem regarding why you don't send the HTTP events when they happen. Our priority is to make the agent usable in all sort of situations, and this use case you described is certainly something to think about. We are evaluating the situation and will get back to you with a proposal.