Closed jverkoey closed 2 months ago
Your changes sound good to me. I think you should submit a PR if you want the changes to be made.
Are you open to a solution that includes throwing exceptions? If so I'll upstream the solution I built into Slipstream:
Gently following up on the question above — happy to send a PR but want to make sure it's aligning with the spirit of the repo. If an exception isn't preferred then I'll send a PR to map to the closest approximate charsets.
I'm only a recently minted maintainer here but I'd say avoid changing the contract in a minor release update and have an empty string or default value or closest approximate
Ah, if you are aiming for a minor release, do you want to consider changes in string values as a breaking change? I don't know if any customers of SwiftSoup are currently relying on the existing behavior.
Good point, I don't have a recommendation in mind for this because I'm also not using this part of the tool directly. Don't know it's worth the pains of a major release, or if anyone would be disrupted by it
One idea that comes to mind is to add a new API instead and deprecate the current one! Apple employs this well when they don't have algo versioning built into their APIs
That raises a good point. The API name is displayName
, so it could be argued that this API was never really intended to be used as an IANA-compatible charset name. In that case it would probably be least disruptive to introduce a new API whose purpose was to generate an IANA-compatible name.
What do you think?
Sounds best to me, with zero disruption
released
There's a few issues with the String.Encoding.displayName implementation.
Implementation of displayName is found here:
https://github.com/scinfu/SwiftSoup/blob/e2d11208519549c2e5798d70190472045633f22f/Sources/String.swift#L189-L216
Invalid charset names
nonLossyASCII
should probably beUS-ASCII
shiftJIS
should probably beShift_JIS
macSymbol
should maybe bemacintosh
isoLatin1
should probably beISO-10646-Unicode-Latin1
iso2022jp
should probably becsISO2022JP
macOSRoman
doesn't seem to have a corresponding mapping.Incorrect name for .utf16
Note the following unexpected behavior:
This appears to be due to the fact that .utf16 and .unicode both use the same underlying storage:
https://www.iana.org/assignments/character-sets/character-sets.xhtml doesn't seem to define
unicode
as a valid charset, so a fix here might be to always hard-codeunicode
to beUTF-16
.