Open emilyrohrbough opened 1 year ago
This is likely caused by a breaking change in 7.0.0 (see https://github.com/inikulin/parse5/releases/tag/v7.0.0), which will now emit multiple text events for long sections (where we drop the underlying buffer). This is a bug fix for a long standing issue within parse5, where the raw text was not reliable.
Hope this makes sense!
This is the relevant PR: https://github.com/inikulin/parse5/pull/432
@fb55 I wasn't observing multiple test events for long sections. Only the last portion of the text was being sent. Is that expected at all?
Looking at the code in the first post of this issue, the important thing to note is that the data
event will only fire for tokens that have not been captured before. So if you assign a listener for text
, this will lead to the data
event not receiving any data from text tokens (unless emitText
is called explicitly). With the listeners removed, I receive the full input as the output.
Looking at cypress' source code, this function doesn't deal with multiple text events for script tags (the relevant logic should be moved to endTag
): https://github.com/cypress-io/cypress/blob/af0da61b8cf07ef260cf50587ba58b1453e21cdc/packages/rewriter/lib/html-rules.ts#L45-L55
@fb55 Sorry for the late replay. THANK YOU for taking a deeper look into this. I will give this a shot and see if this resolves the issue.
We are using the
parse5-html-rewriting-stream
, to parse html & js body content from intercepted browser requests.This work well, but recently stumbled across an issue where the following
script
content were being incorrectly/partially emitted when the internal contents were JS setting a constant to a JSON.The
startTag
event is correctly emitting:<script>
as the raw contents, but thetext
event is sending part of the JS object in the raw contents:I expect the raw contents of the
text
event to match the full contents.