Remove unnecessary unsafe functions

djkoloski commented 8 months ago

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.

Summary by CodeRabbit

Refactor
- Improved error handling mechanisms for better stability.
- Enhanced safety by removing unnecessary unsafe blocks and comments across various components.
- Streamlined Position and Span struct creations for increased code safety and readability.

coderabbitai[bot] commented 8 months ago

Walkthrough

The recent updates to the pest library involve significant improvements in error handling and safety. The changes include eliminating unsafe code blocks and refining the creation of Position and Span objects for better reliability. These modifications enhance the overall safety and maintainability of the codebase, making it more robust and error-resistant.

Changes

Files	Change Summary
`error.rs`, `parser_state.rs`	Replaced direct `Position::new` with `new_internal` for improved error handling.
`iterators/flat_pairs.rs`	Removed `unsafe` and safety comments in `FlatPairs`.
`iterators/pair.rs`	Updated safety comments and calls in `Pair` for safer `Span` creation.
`iterators/pairs.rs`	Eliminated `unsafe` blocks in `flatten`, `peek`, and `next_back`.
`iterators/tokens.rs`	Updated struct initialization and error handling in `Tokens`.
`position.rs`, `span.rs`	Refactored `Position` and `Span` creation to use `new_internal`, removing `unsafe` usage.

🐇✨
In the realm of code where bugs dare to tread,
A rabbit hopped in, making errors dread.
With a flick and a hop, unsafe tags were shed,
Positions and spans, now safely led.
"To safer pastures!" the rabbit said.
🌟🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

Tips

### Chat There are 3 ways to chat with CodeRabbit: - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit-tests for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit tests for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit tests.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger a review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - The JSON schema for the configuration file is available [here](https://coderabbit.ai/integrations/coderabbit-overrides.v2.json). - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json` ### CodeRabbit Discord Community Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback.

djkoloski commented 8 months ago

One option for fixing #993

tomtau commented 8 months ago

This test fails: https://github.com/pest-parser/pest/actions/runs/8378691318/job/22959618598#step:6:1

djkoloski commented 8 months ago

I feel like this PR is not getting to the point. Would you prefer:

Pest keeps the type invariant that Position always lands on a UTF-8 codepoint boundary, or
Pest stops caring about whether Position lands on a codepoint boundary because all slicing and indexing operations are checked.

The implications of 1:

All Positions must refer to a valid UTF-8 codepoint boundary. Similar invariants propagate into Span, Pair, FlatPairs, etc.
Instead of removing unsafe from the new_unchecked functions, all uses of the unsafe functions are verified.
Indexing and slicing operations using Position switch to unchecked versions, skipping bounds and codepoint boundary checking.

The implications of 2:

All of the unsafe functions are turned safe.
No more safety docs required. They should be removed because safety docs are for unsafe code.

I would also appreciate clarity on:

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.
Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: flat_pairs::new, pair::new, pairs::new, and tokens::new are all internal APIs (checked by enabling the unreachable_pub lint).

tomtau commented 8 months ago

Thanks @djkoloski , that helps.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API.

Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.

Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.

Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

It doesn't, at least not consistently, but it should.

Anyway, I think we can merge this PR and open an issue for documenting panics.

pest-parser / pest