ximion / appstream-dep11

Python module to handle Debian DEP-11 metadata
https://github.com/ximion/appstream-generator
GNU Lesser General Public License v3.0
3 stars 3 forks source link

metainfo parser: Cope with XML namespaces #13

Closed iainlane closed 8 years ago

iainlane commented 8 years ago

We currently fail to parse gnome-terminal because we're parsing the metainfo file as having no 'id' tag, but it does.

ElementTree includes the namespace when asking for tag names, if one is supplied. Look at just the end of the returned tags and attributes, where the actual name is, to deal with this.

This fixes the extraction of gnome-terminal as seen in Debian & Ubuntu

ximion commented 8 years ago

Since when do we have an https://www.freedesktop.org/standards/appdata/1.0 namespace? That never existed, if so, it should probably be something like https://www.freedesktop.org/standards/appstream-metainfo/1.0 (or just appstream). Really ugly... I am tempted to make the ancient-metadata hint a warning, so people migrate to something sane and actually specified and documented faster. I am not super-happy with the .endswith, because in theory a tag "license" and "metadata_license" might exist, which would result in problems. Is gnome-terminal the only thing behaving that way? If so, it should be fixed and we shouldn't accept that behavior in the generator. If there is more than just GT doing that, we can apply that workaround... (Can ElementTree be told to see a namespace as default, so it doesn't give the full path for that one?)

iainlane commented 8 years ago
ubuntu@juju-prod-ue-appstream-machine-4:~$ ./forget.py metainfo-no-id
dep11-generator forget appstream-workdir gnome-terminal
dep11-generator forget appstream-workdir empire
dep11-generator forget appstream-workdir qterminal
dep11-generator forget appstream-workdir aisleriot
dep11-generator forget appstream-workdir gucharmap
dep11-generator forget appstream-workdir evince-common
dep11-generator forget appstream-workdir gcu-bin
dep11-generator forget appstream-workdir gchempaint
dep11-generator forget appstream-workdir gcrystal

I didn't actually check all of these but at least some of them look the same. People must have been copying files with this namespace set.

What if I split on } and == compare the last part instead of using endswith?

iainlane commented 8 years ago

What if I split on } and == compare the last part instead of using endswith?

After some searching it seems that most people prefer to just strip the default xmlns, so I've done that instead. Let me know what you think. :)

ximion commented 8 years ago

This looks better, although In wonder if we should call a function to properly split namespace and tag-name to the parser...

iainlane commented 8 years ago

Thanks! I looked and I didn't find a function to do that. I think that I could have set the default ns to this one, but that is weird if you say it is not spec-compliant.

We might want to make the validator fail on this?

ximion commented 8 years ago

I wonder what is the best option here... Making the generator fail is probably too stong, emitting a warning might be an option - we now compile a regex for every metainfo file, because some decided to add a namespace (which might cost us a few msec in processing speed). I think actually defining a namespace in the spec and then making this an error (with some transition phase) might be the better thing to do. I will look into this... Maybe claiming https://www.freedesktop.org/standards/appstream/1.0 might actually make sense here...