Closed benkiel closed 6 years ago
wow that was fast! ๐ sorry guys, I just came back from holidays and am still going through the backlog of emails so didn't get the change to review this.
So basically now we are happy to have a required dependency on lxml (a native extension module)? The times are a-changin' ๐
@anthrotype Yes, we're fine with the dependency โ did you want to contribute the lxml plist bit to this?
I don't see lxml specified in the setup.py "install_requires" though... you only added it to the requirements.txt file, which is only used internally here for creating the test environment. If one installs ufoLib from pypi in a fresh environement where lxml is not already installed, this will now raise ImportError, if it truly is required. We need to fix this ASAP. I can do that.
@anthrotype ah, crud, sorry โ that's my bad for missing it. Yes, if you can fix, that would be great, much appreciated
can you remind me again why in the end we decided to have lxml as hard dependency as opposed to use the ElementTree API (supported by both lxml and built-in ElementTree library) and try to use lxml when it's available else fall back to the built-in ElementTree library if not?
I have mixed feelings because, on the one had, I'm happy when we add good third-party dependencies (usually better than our artisanal "reinvented" wheel); on the other hand, this shouldn't be done too lightly, especially when this is compiled extension, and one which, unlike the others (compreffor, pyclipper, etc.) we don't control directly.
@anthrotype We could do that, not opposed. We wanted to make lxml the thing for speed, but understand wanting a graceful fallback.
but maybe you or @typesupply decided to require lxml because you're depending on some lxml-specific features that are not available with the built-in elementtree library? we need to test this it not the case
or if it is, then fine, we can add lxml as required dependency.
the problem is, we need to decide now, otherwise everybody who does pip install fontmake or fontParts or anything will get ImportError
It's been a really long time since I wrote this... but, I didn't think I was relying on any LXML specific features. Falling back to ET when LXML isn't available is fine with me.
we are using pretty_print=True
option in etree.to_string
function, which only works for lxml, not for elementtree library, whose writer can't automatically indent..
I guess we need to use something like this as fallback:
https://github.com/fonttools/fonttools/blob/8d7774a3e81df1afc6dedb1272e43c7c3aea100e/Lib/fontTools/designspaceLib/__init__.py#L91-L105
Ah, yeah. I forgot about that. It's so weird that ET doesn't have this.
and also, the encoding="unicode"
argument to etree.tostring
is a lxml thing.
Actually it's a python3 thing
https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.tostring
i'm going to encode as a UTF-8 bytes string and decode it as Unicode string before returning it from writeGlyphToString
two more problems
1) ElementTree's tostring always seems to sort attributes alphabetically, even if we use OrderedDict; lxml uses the order we give it. (in the diff below, the minus lines are lxml's, plus are ElementTree's)
- <glyph name="a" format="2">
+ <glyph format="2" name="a">
2) When it's a simple element (without children), ElementTree always adds an extra space before the ending />
, lxml doesn't:
- <unicode hex="0062"/>
+ <unicode hex="0062" />
yeah, ElementTree's _serialize_xml
functoin calls sorted
on the attributes items:
https://github.com/python/cpython/blob/831c29721dcb1b768c6315a4b8a4059c4c97ee8b/Lib/xml/etree/ElementTree.py#L928
:cry:
Perhaps that's enough to say "Alright, sorry, lxml is a requirement"?
it was worth a try. I really wouldn't like to implement our own elementtree xml serializer.. Let's require lxml then.
@anthrotype Agree. If there is pushback on the lxml requirement, we can revisit. Thanks for all the work, really appreciated!
just pushed v2.2.1.
Oh, when tagging a new release, I like to "edit" the github release notes so that they will be displayed as formatted markdown, instead of as a monospaced code block that is used by default when pushing a git tag. I basically copy and paste the same text of the git tag. Probably there are tools to automate this, but I'm lazy.
@anthrotype got it. and, thanks again; know we just did that one quick โ it had been sitting for a while with no comments on the other thread so before the code got too stale it was merged. Thanks again for your running down ETree and everything โ it is really appreciated!
Thoughts about a major version increment for these changes? It sounds like the mandatory use of lxml writes may have the potential for large source text diffs c/w the last release for some users (if I understood Cosimo's comment above correctly) and the elimination of default source validation on UFO reads through the main i/o reading class is a significant change in expected behavior. This has the potential to lead to breakage in projects that depend on the previous default behaviors.
@chrissimpkins Fair point. There shouldn't (fingers crossed) be too many diff changes, we didn't change the test cases and made lxml conform to the tests, though I suppose there will still be some. Also open to changing read to validate by default if that seems better.
I also think it would be better to bump major version for this. I'll add the plistlib module from ufoLib2 and then go to 3.0, if you guys are also OK with it.
@anthrotype Agree. What do you think about the default being to validate on read
?
maybe that could be safer, as long as it's easy to opt-out. @typemytype seems to prefer validate=False here https://github.com/typesupply/defcon/pull/191/files
in the defcon PR I linked, the defaults are rather the opposite of what @benkiel was suggesting:
https://github.com/typesupply/defcon/pull/191/files#diff-c6a63107dc3c355ab30047131285dc06R452
read is False, write is True.. Not sure what's best.
My 2ยข is that validation should be on by default in all cases. That preserves backwards compatibility, for what that is worth. I also didn't engineer defcon against malicious input so I prefer that turning validation off be something that one has to knowingly do.
makes sense. Ok, let's flip the default to be validate=True in ufoLib, and probably defcon too. @benkiel would you work on that? I'll work on integrating the new plist module.
I was just following ufoLib in defcon, will change it to read and write to be validated by default in defcon.
@anthrotype done in #155. This now means that validation is on by default for both read and write (it was on for write before)
Add the option to turn on/off validation. Switch to lxml.