Open Mingun opened 3 months ago
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Attention: Patch coverage is 61.13537%
with 178 lines
in your changes missing coverage. Please review.
Project coverage is 60.13%. Comparing base (
7558577
) to head (eb90e9f
). Report is 94 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I finished work on the base part of the entities support. In this PR new Event::GeneralRef
is added together with new BytesRef
struct which represents the any &...;
reference, including entity references and character references. Character references can be resolved by call BytesRef::resolve_char_ref()
, entity references can be resolved by mapping from content of BytesRef
to replacement text. Both usages are shown in the updated custom_entities
example.
I won't have a chance to fully review this for a couple of days. Quick question though, am I correct in thinking that this PR will mean that any time a text block contains one or more entity references, instead of the developer receiving one Event::Text
containing everything between the opening and closing tags, they will receive a series of Event::Text
and Event::GeneralRef
which they will then need to merge back together themselves into the original text?
Yes, you are correct. But that does not mean that he/she will needed to construct the complete text themselves. In the next PR I plan to rename Reader
to the RawReader
and add make it return borrow-only RawEvent
s, and in in the another PR introduce new Reader
which will automatically merge all consequent Text
, CData
and GeneralRef
events. This should be much more convenient for the average user. RawReader
will only be needed for very fine control.
Because renames affects very many places, I want to do that in a separate PR to reduce noise in PR with new Reader
.
Borrow-only reader-only events will be useful also in that sense that I plan to add offset
member to them to track event position in the stream. When you construct event for writing you are obviously does not have position and I think it is better to not have a dummy value for it, in order to you couldn't mistakenly use writer event in the reading context and get the wrong position.
Because new Reader
will have a stack of the RawReader
s (in the same way as demonstrated in custom_entities
example), it will be simple recreate readers when we will need to change encoding, so I think, that #158 is very close to resolving.
@dralley, what do you think about this?
This is a big change in handling general entity references and character references. Open PR early to get feedback.
With this changes we can correctly parse document
as equivalent normalized document
The updated
custom_entities
example shows how it would be possible to implement requirement from the specification about parsed general entities. Serde deserializer did not updated yet, because this is not trivial part and probably that will be done in another PR.Of course, such change probably makes the performance worse, I didn't measure impact yet.
Closes #667