The problem

There is in principle no robust way to map programmatic HTTP requests or responses (XMLHttpRequest and fetch() calls) to modified HTML page elements without performing full-scale taint analysis, which is much too heavy (and maybe impossible). For example:

Client: JavaScript code requests GET /first/url via XMLHttpRequest
Server: Sends the /first/url response
Client: JavaScript code requests GET /second/url via XMLHttpRequest
Server: Sends the /second/url response
Client: Waits 5 seconds
Client: Updates a <div> element with data from the /second/url response
Client: Waits another 5 seconds
Client: Updates a different <div> element with data from the /first/url response

A browser extension (or in general, any JavaScript code that the existing site's JavaScript does not have a dependency on) has no way to determine that the update in step 6 came from /second/url, while the update in step 8 came from /first/url.

Existing code attempts to solve this by either:

Scanning JavaScript source code for calls to new XMLHttpRequest and getElementById():
- But (as acknowledged in the README) this makes many strong assumptions (e.g., inline JavaScript; modification follows a call to getElementById() (rather than, e.g., querySelector()); the modified element appears as a string literal in this call) that can lead to both false positives and false negatives.
Tracking the most recent DOM mutation through a MutationObserver:
- But this attaches provenance header data to the HTML element identified by originalMutation at the time the server request completes, which is necessarily before any HTML mutation that depends on that response, meaning that the accesses to originalMutation in the linked code will in fact refer to some (irrelevant) previous mutation. In particular, the first time the linked code runs, originalMutation will be undefined. Additionally, the code currently only tracks the most recently modified HTML element.

Solutions

Given that there is no ideal way to solve the problem, there are 2 possible ways forward:

Drop the requirement of showing which HTML elements were updated by a particular HTTP request, and just show, e.g., a small button that can be clicked to show a list of all HTTP responses resulting from the current page and having attached provenance data.
Attempt to map provenance-enriched HTTP responses to modified HTML elements in a more defensible, but still heuristic, way.

For now, to get something going, I'll go with option 1 -- something like this is needed in any case for handling non-JavaScript-initiated requests (e.g., full page loads). #5 (EDIT: originally given below) describes a straightforward approach that could be used to implement option 2 later.

veracitylab / DOM-Instrumentation-to-Display-Provenance-Data

No robust way to map requests to modified page elements #4

The problem

Solutions