undef1nd / sfv

Structured HTTP field values parsing and serialization. Implementation of RFC 8941 https://httpwg.org/specs/rfc8941.html
Apache License 2.0
25 stars 6 forks source link

Validate bare items on construction #102

Open zcei opened 1 year ago

zcei commented 1 year ago

Closes #98

Hej folks,

as promised in the related issue, I gave a stab at validating the bare items on construction.

~For the moment I have introduced BareItem::Validated* variants adjacent to their unvalidated enum member counterparts. Using Deref I was able to quite easily pass them to their respective RefBareItem version and thus could avoid a breaking change if wanted. I think eventually the shorthand version of a BareItem member should be the safe variant, as shorter names tend to be used more frequently.~

~To demonstrate that constructing the bare_item::* structs without using (Try)From is not possible, I've added an unused function in the bench setup. When using from within the same module tree, one can access the private struct members, so I had to do it outside of it.~

BareItem variant validations

I looked at all the currently existing enum variants and mapped out whether they have validations that run in the parse or serialize steps.

Variant Serialize Validation Parse Validation
Decimal
Integer
String
ByteSeq
Boolean
Token

(Right now all have some form of parse validations, to verify integrity of the input strings. I decided later to not focus on the parsing step for the moment, so this is just mentioned for completeness.)

On the serialization side, all but Boolean and ByteSeq check for specific conditions on the backing data structure. For example i64 supports a bigger number range than the spec describes for an Integer. Tokens have to start with specific characters, etc.

Approach

For each variant that needs validation while being serialized, their value struct implements a TryFrom which uses the Serializer::serialize_* method to check for correctness on construction. Is is somewhat wasteful and in a follow up PR validation could be moved out to their respective structs and called upon construction and then be omitted during serialization.

The value structs inner value is public within the crate. This means after parsing an input we can create the value structs without going through validation again.

For the two variants that don't need validation and can always be serialized (Boolean and ByteSeq) the From trait is implemented. Backing value structs are created to a) keep consistency in the implementation and b) to allow for validations/parsing to be offloaded into the value structs without a breaking change if that is ever desired.

Non-goals

As parsing usually happens from a complete structured field value and not their parts, I omitted this for the moment. This means right now you can only construct those value structs from their "native representation" and not from their serialized string value (e.g. 42_i64 can be converted into a bare_item::Integer, but "42" can not). If ever needed, parsing can be offloaded into the value structs and bare_item::Integer::parse("42") or so could become reality.

valenting commented 1 year ago

For the moment I have introduced BareItem::Validated* variants adjacent to their unvalidated enum member counterparts.

Do you see us keeping both Validated and unvalidated enum types, or do you intend to remove the unvalidated ones?

and thus could avoid a breaking change if wanted

I think changing the enum in any way might count as a breaking change. In any case, as mentioned earlier, we need to make a breaking change to add the Date type anyway, so we're not constrained by that.

zcei commented 1 year ago

Do you see us keeping both Validated and unvalidated enum types, or do you intend to remove the unvalidated ones?

Initially it was to keep the amount of touched LOC low, then I thought it might be nicer to transition over. Forgot at that moment that you either way wanted to introduce a breaking change.

One could argue that the "struct wrappers" are not needed for all the enum variants, like Integer(i64) can already only accept a valid 64-bit integer. But I would say with Rust's .into() and the proper From traits implemented it's an okay-ish overhead to have all of them in their value type wrappers, so that we keep them consistent. Only outlier here is the Decimal as it's basically already a value type that we receive.


Idea: let's rename the current enum members to something like Raw* / Unvalidated* / Unsafe* members. The migration path is really easy in an existing codebase via search&replace. Then we can have the new and recommended way with the same names as the current enum members, use the breaking change for that, yet upgrade migration isn't a pain and users can gradually transition to validated bare items. WDYT?

valenting commented 1 year ago

let's rename the current enum members to something like Raw / Unvalidated / Unsafe* members.

Would they still be in the same enum as the validated types? Either way, I'd love to look at some code.

The IETF list also has some suggestions about adding #[non_exhaustive] to BareItem If you don't add it now I'll do it later when adding the Date type.

https://lists.w3.org/Archives/Public/ietf-http-wg/2023JanMar/0034.html

zcei commented 1 year ago

Just as a quick update from some work on the weekend:

I decided it's not really worth it to keep both the "unsafe" variants and the validated ones at the same time, as it becomes a bit weird to deal with. The parsing would have yielded "unsafe" variants, even though the parser validated that they're correct.

So the code change will be a bit bigger, as it changes most of the test invocations from .into() to .try_into(), etc. Also some of the tests will need slight changes where errors were expected during the serialization. Now they fail when the value struct gets created.

~I'm right now fighting with the String value struct, as I can't easily use the parse_string function when using TryInto, as you may or may now have it wrapped with quotes and the parser expects "double escaped" values in string (like literal values \" which are "\\\"" as a rust string) whereas it's totally fine to have a quote " ("\"") as a string input which can be serialized.~

Edit: The problem here was to expect that we pass parseable inputs. This was a wrong assumption on my end. We need serializable values and therefore use the Serializer::serialize_* methods now internally to check for that. It's a bit wasteful, but can be optimized later.

Once that's done I'll open up this PR for review, with one commit per BareItem enum variant being moved to the value structs.

P.S.: Didn't know about the #[non_exhaustive], that's cool!

zcei commented 1 year ago

@valenting I updated the PR description to explain the work that has happened. Let me know if there's anything controversial to it. Specifically I'm interested what you think about the whole .try_into() and .into() dance, which is now necessary to construct field values manually.

I'm happy to be pointed to whatever you think is worth changing, I'm sure there's something I missed 🙂

valenting commented 1 year ago

We can consider that for a follow-up change. For now let's keep using Decimal.

On Wed, 15 Feb 2023 at 10:51, Stephan Schneider @.***> wrote:

@.**** commented on this pull request.

In src/bare_item.rs https://github.com/undef1nd/sfv/pull/102#discussion_r1106897515:

  • /// ```
  • fn try_from(item: i64) -> Result<Self, Self::Error> {

  • Ok(BareItem::Integer(item.try_into()?))

  • }

+}

+

+impl TryFrom for BareItem {

  • type Error = &'static str;

  • /// Converts rust_decimal::Decimal into BareItem::Decimal.

  • /// ```

  • /// # use sfv::{BareItem, FromPrimitive};

  • /// # use std::convert::TryInto;

  • /// use rust_decimal::Decimal;

  • /// # fn main() -> Result<(), &'static str> {

  • /// let decimal_number = Decimal::from_f64(48.01).unwrap();

  • /// let bare_item: BareItem = decimal_number.try_into()?;

@valenting https://github.com/valenting what's your stance on having the external API only accepting f64? Do you want to keep the usage of rust_decimal::Decimal? The spec seems to only allow 12 digits on the integer component and 3 digits on the fractional component. f64 should cover that completely? 🤔

— Reply to this email directly, view it on GitHub https://github.com/undef1nd/sfv/pull/102#discussion_r1106897515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALDEOLEM4LJZORBS5RRIT3WXSRKRANCNFSM6AAAAAAUCRR3FY . You are receiving this because you were mentioned.Message ID: @.***>

zcei commented 1 year ago

Allowed ways to construct a BareItem per variant

// BareItem::Decimal
let value = BareItem::new_decimal_from_f64(13.37)?;
let value: BareItem = 13.37.try_into()?;

let decimal = rust_decimal::Decimal::from_f64(13.37).unwrap();
let value = BareItem::new_decimal(decimal);

let value: BareItem = decimal.try_into()?;

// BareItem::Integer
let value = BareItem::new_integer(42)?;
let value: BareItem = 42.try_into()?;

// BareItem::String
let value = BareItem::new_string("foo")?;
// No `.try_into()`, as it's ambiguous whether `Token` or `String` variant should be used

// BareItem::ByteSeq
let value = BareItem::new_byte_seq("hello".as_bytes())?;
let value: BareItem = "hello".as_bytes().try_into()?;

// BareItem::Boolean
let value = BareItem::new_boolean(true)?;
let value: BareItem = true.try_into()?;

// BareItem::Token
let value = BareItem::new_token("foo")?;
// No `.try_into()`, as it's ambiguous whether `Token` or `String` variant should be used

The recommended way would be to use the BareItem::new_* constructors, but I do believe the TryInto variants are quite useful still when creating an Item right away, like:

Item::new(true.try_into()?);