jam1garner / binread

A Rust crate for helping parse structs from binary data using ✨macro magic✨
MIT License
258 stars 18 forks source link

Usage of rsplitn(1, <>) is probably not doing what you want it to #14

Closed kitlith closed 3 years ago

kitlith commented 4 years ago

From the docs:

An iterator over substrings of this string slice, separated by a pattern, starting from the end of the string, restricted to returning at most n items.

If n substrings are returned, the last substring (the nth substring) will contain the remainder of the string.

Tested on the playground, what this means is that rsplitn(1, <>).nth(0) is just returning the string as-is, with no changes. To get the behavior you were probably intending, you need to use 2.

That said, when generics are involved, I don't think you really want somecrate::A<somecrate::B> to become B>, so...

https://github.com/jam1garner/binread/blob/1232eb7e61130625acc5a2fa894f049da4283ede/src/binread_impls.rs#L82 https://github.com/jam1garner/binread/blob/1232eb7e61130625acc5a2fa894f049da4283ede/src/binread_impls.rs#L127

kitlith commented 4 years ago

This may be (part of) the reason why invalid debug templates are generated for me, as this line: nintendo_patricia_tree::Node<brsar_rs::brsar::block::symbol::TreeData<brsar_rs::brsar::block::info::SoundInfo>> var1292[2345]<bgcolor=0x7E2DD2>; looks like it was generated through write_vec, and these are the only two places that I can see that even call write_vec.

csnover commented 3 years ago

I looked at this briefly since I was in this the area of the code earlier, and I am not really sure there is any reasonable way to make this code work as intended.

The current code is effectively a no-op, as mentioned; the next least naïve approach would be to scan forward to the first < and then walk back from there to the nearest ::, but this then still doesn’t handle:

  1. Generic types;
  2. References;
  3. Array types;
  4. Tuple types;
  5. Compiler changes to type_name’s output (there are many explicit warnings in the documentation about how the output format is not guaranteed).

Since AFAICT this is only for debugging BinRead invocations it seems like the only tractable way to address this ticket is to just remove the code that is trying to massage the output of type_name entirely.

kitlith commented 3 years ago

Maybe we could add an associated const to the BinRead trait then, containing the name of the type for these purposes? I think derive macros have better access to the name of the type they're implementing a trait for... would be a breaking change for any manual impls though.

jam1garner commented 3 years ago

Associated consts can have a default so not a breaking change

kitlith commented 3 years ago

Ah! So they can, but we may not necessarily be able to provide a good default. we could only default to type_name on nightly, though I suppose we could work around that by making the associated const an Option<&'static str>, defaulting to None (or maybe just default to empty string and cut out option entirely, idk), and defaulting to type_name when None. This way, we at least get some output, even if it causes the same issues as before, and we can encourage implementors to add a string that does not use any restricted characters.

kitlith commented 3 years ago

I also just had a cursed idea that may be usable in combination with any of this: remember https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcsq64p?utm_source=share&utm_medium=web2x&context=3 ? replace < with , > with , : with (https://util.unicode.org/UnicodeJsps/confusables.jsp?a=%3A), as well as any other relevant special characters and see how 101 handles it :smiling_imp:

jam1garner commented 3 years ago

Unfortunately that mangling scheme isn't possible in 010, although we could spring for something else

kitlith commented 3 years ago

perhaps :: -> __, < to _b_, and > to _d? so we'd have nintendo_patricia_tree__Node_b_brsar_rs__brsar__block__symbol__TreeData_b_brsar_rs__brsar__block__info__SoundInfo_d_d

it occurs to me that i should check the output for a type that takes multiple generic arguments. perhaps , -> _X_ or something.

personally, all of this is less readable, but i guess there's not much we can do until/unless 010 has better support for unicode in templates.

csnover commented 3 years ago

This code was deleted in https://github.com/jam1garner/binrw/commit/d8f62ebe22c8ad341b0a0414623d6774de9f77fc so this issue can be closed.