librasn / rasn

A Safe #[no_std] ASN.1 Codec Framework
Other
212 stars 50 forks source link

Should `rasn-ldap` `LdapString` types actually use `String` instead of `Bytes`? #304

Closed repnop closed 1 month ago

repnop commented 2 months ago

so while reading through the LDAP RFC, I noticed that it mentioned that the LDAPString type is encoded as ISO/IEC 10646 which it calls a "superset" of Unicode, however, if we go to the Unicode FAQ page for this, it says:

Q: What are the differences between ISO/IEC 10646 and Unicode?

While the character codes and encoding forms are synchronized between Unicode and ISO/IEC 10646, the Unicode Standard imposes additional constraints on implementations to ensure that they treat characters uniformly across platforms and applications. To this end, it supplies an extensive set of functional character specifications, character data, algorithms and substantial background material that is not part of ISO/IEC 10646.

this, to me, says that it would be fine to use String instead of whatever backing type for OCTET STRING since the code points are defined to be exactly the same, but Unicode also defines algorithms and normalization methods for working with Unicode text, which is orthogonal to the encoding itself, hence the representations, as far as the Rust String type is concerned, are equivalent. this would make working with LDAP types much easier, since you would not need to check if the contents are UTF-8 encoded every time you want to get a &str or String out of them.

XAMPPRocky commented 1 month ago

Thank you for your issue! Yes, I think that probably makes the most sense given the information you've provided. If someone wants to create a PR for this, I'm more than happy to review it.