pest-parser / pest

The Elegant Parser
https://pest.rs
Apache License 2.0
4.67k stars 261 forks source link

Remove unnecessary unsafe functions #998

Closed djkoloski closed 8 months ago

djkoloski commented 8 months ago

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.

Summary by CodeRabbit

coderabbitai[bot] commented 8 months ago

Walkthrough

The recent updates to the pest library involve significant improvements in error handling and safety. The changes include eliminating unsafe code blocks and refining the creation of Position and Span objects for better reliability. These modifications enhance the overall safety and maintainability of the codebase, making it more robust and error-resistant.

Changes

Files Change Summary
error.rs, parser_state.rs Replaced direct Position::new with new_internal for improved error handling.
iterators/flat_pairs.rs Removed unsafe and safety comments in FlatPairs.
iterators/pair.rs Updated safety comments and calls in Pair for safer Span creation.
iterators/pairs.rs Eliminated unsafe blocks in flatten, peek, and next_back.
iterators/tokens.rs Updated struct initialization and error handling in Tokens.
position.rs, span.rs Refactored Position and Span creation to use new_internal, removing unsafe usage.

🐇✨
In the realm of code where bugs dare to tread,
A rabbit hopped in, making errors dread.
With a flick and a hop, unsafe tags were shed,
Positions and spans, now safely led.
"To safer pastures!" the rabbit said.
🌟🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips ### Chat There are 3 ways to chat with CodeRabbit: - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit-tests for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit tests for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit tests.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger a review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - The JSON schema for the configuration file is available [here](https://coderabbit.ai/integrations/coderabbit-overrides.v2.json). - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json` ### CodeRabbit Discord Community Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback.
djkoloski commented 8 months ago

One option for fixing #993

tomtau commented 8 months ago

This test fails: https://github.com/pest-parser/pest/actions/runs/8378691318/job/22959618598#step:6:1

djkoloski commented 8 months ago

I feel like this PR is not getting to the point. Would you prefer:

  1. Pest keeps the type invariant that Position always lands on a UTF-8 codepoint boundary, or
  2. Pest stops caring about whether Position lands on a codepoint boundary because all slicing and indexing operations are checked.

The implications of 1:

The implications of 2:

I would also appreciate clarity on:

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: flat_pairs::new, pair::new, pairs::new, and tokens::new are all internal APIs (checked by enabling the unreachable_pub lint).

tomtau commented 8 months ago

Thanks @djkoloski , that helps.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API.

Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.

Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.

Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

It doesn't, at least not consistently, but it should.

Anyway, I think we can merge this PR and open an issue for documenting panics.