scinfu / SwiftSoup

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
https://scinfu.github.io/SwiftSoup/
MIT License
4.54k stars 347 forks source link

String.Encoding.displayName() does not generate IANA-compatible charset names #279

Closed jverkoey closed 2 months ago

jverkoey commented 3 months ago

There's a few issues with the String.Encoding.displayName implementation.

Implementation of displayName is found here:

https://github.com/scinfu/SwiftSoup/blob/e2d11208519549c2e5798d70190472045633f22f/Sources/String.swift#L189-L216

Invalid charset names

Incorrect name for .utf16

Note the following unexpected behavior:

(lldb) > po String.Encoding.utf16.displayName()
unicode
(lldb) > po String.Encoding.unicode.displayName()
unicode

This appears to be due to the fact that .utf16 and .unicode both use the same underlying storage:

(lldb) po String.Encoding.utf16
▿ Unicode (UTF-16)
  - rawValue : 10
(lldb) po String.Encoding.unicode
▿ Unicode (UTF-16)
  - rawValue : 10

https://www.iana.org/assignments/character-sets/character-sets.xhtml doesn't seem to define unicode as a valid charset, so a fix here might be to always hard-code unicode to be UTF-16.

aehlke commented 3 months ago

Your changes sound good to me. I think you should submit a PR if you want the changes to be made.

jverkoey commented 3 months ago

Are you open to a solution that includes throwing exceptions? If so I'll upstream the solution I built into Slipstream:

https://github.com/jverkoey/slipstream/blob/35e025fc3bc038d85fc0306c3b55a6f64131b0ef/Sources/Slipstream/W3C/Elements/DocumentMetadata/Charset.swift#L43

jverkoey commented 3 months ago

Gently following up on the question above — happy to send a PR but want to make sure it's aligning with the spirit of the repo. If an exception isn't preferred then I'll send a PR to map to the closest approximate charsets.

aehlke commented 3 months ago

I'm only a recently minted maintainer here but I'd say avoid changing the contract in a minor release update and have an empty string or default value or closest approximate

jverkoey commented 3 months ago

Ah, if you are aiming for a minor release, do you want to consider changes in string values as a breaking change? I don't know if any customers of SwiftSoup are currently relying on the existing behavior.

aehlke commented 3 months ago

Good point, I don't have a recommendation in mind for this because I'm also not using this part of the tool directly. Don't know it's worth the pains of a major release, or if anyone would be disrupted by it

One idea that comes to mind is to add a new API instead and deprecate the current one! Apple employs this well when they don't have algo versioning built into their APIs

jverkoey commented 3 months ago

That raises a good point. The API name is displayName, so it could be argued that this API was never really intended to be used as an IANA-compatible charset name. In that case it would probably be least disruptive to introduce a new API whose purpose was to generate an IANA-compatible name.

What do you think?

aehlke commented 3 months ago

Sounds best to me, with zero disruption

aehlke commented 2 months ago

released