Closed djkoloski closed 8 months ago
The recent updates to the pest
library involve significant improvements in error handling and safety. The changes include eliminating unsafe
code blocks and refining the creation of Position
and Span
objects for better reliability. These modifications enhance the overall safety and maintainability of the codebase, making it more robust and error-resistant.
Files | Change Summary |
---|---|
error.rs , parser_state.rs |
Replaced direct Position::new with new_internal for improved error handling. |
iterators/flat_pairs.rs |
Removed unsafe and safety comments in FlatPairs . |
iterators/pair.rs |
Updated safety comments and calls in Pair for safer Span creation. |
iterators/pairs.rs |
Eliminated unsafe blocks in flatten , peek , and next_back . |
iterators/tokens.rs |
Updated struct initialization and error handling in Tokens . |
position.rs , span.rs |
Refactored Position and Span creation to use new_internal , removing unsafe usage. |
🐇✨
In the realm of code where bugs dare to tread,
A rabbit hopped in, making errors dread.
With a flick and a hop, unsafe tags were shed,
Positions and spans, now safely led.
"To safer pastures!" the rabbit said.
🌟🌿
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
One option for fixing #993
I feel like this PR is not getting to the point. Would you prefer:
Position
always lands on a UTF-8 codepoint boundary, orPosition
lands on a codepoint boundary because all slicing and indexing operations are checked.The implications of 1:
Position
s must refer to a valid UTF-8 codepoint boundary. Similar invariants propagate into Span
, Pair
, FlatPairs
, etc.unsafe
from the new_unchecked
functions, all uses of the unsafe functions are verified.Position
switch to unchecked
versions, skipping bounds and codepoint boundary checking.The implications of 2:
I would also appreciate clarity on:
checked_pow
vs strict_pow
. Strict APIs panic on invalid input, checked APIs return None
on invalid input.# Panics
section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: flat_pairs::new
, pair::new
, pairs::new
, and tokens::new
are all internal APIs (checked by enabling the unreachable_pub
lint).
Thanks @djkoloski , that helps.
Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API.
Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).
Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.
Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.
Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.
It doesn't, at least not consistently, but it should.
Anyway, I think we can merge this PR and open an issue for documenting panics.
Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.
Summary by CodeRabbit
unsafe
blocks and comments across various components.Position
andSpan
struct creations for increased code safety and readability.