ximion / appstream-generator

A fast AppStream metadata generator
GNU Lesser General Public License v3.0
44 stars 29 forks source link

Generated Components.yml contains duplicate language in Keywords - invalid YAML #92

Closed robert-ancell closed 3 years ago

robert-ancell commented 3 years ago

The components file in Ubuntu (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_hirsute_main_dep11_Components-amd64.yml.gz) fails YAML parsing due to containing duplicate languages in the Keywords mapping.

Specifically the first entry fails:

---
File: DEP-11
Version: '0.12'
Origin: ubuntu-hirsute-main
MediaBaseUrl: https://appstream.ubuntu.com/media/hirsute
Time: 20210422T135021
---
Type: desktop-application
ID: software-properties-gtk.desktop
Package: software-properties-gtk
...
Keywords:
  de_AT:
  - Treiber
  - Paketquellen
  - Paketquelle
  - PPA
  ca_ES:
  - controladors
  - dipòsits
  - dipòsit
  - ppa
  - orígens
  - programari
...
  ca_AD:
  - controladors
  - dipòsits
  - dipòsit
  - ppa
  - orígens
  - programari
  ca_ES:
  - controladors
  - dipòsits
  - dipòsit
  - ppa
  - fonts
  - programari

The YAML specification states:

The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique

robert-ancell commented 3 years ago

I'm having trouble setting up appstream-generator to reproduce this case, but I think the issue may be in src/asgen/backends/ubuntu/ubupkg.d:extractLangpacks() where /var/lib/locales/supported.d/ca is parsed and the @valencia modifier may be confusing things.

ca_AD.UTF-8 UTF-8
ca_ES.UTF-8 UTF-8
ca_ES.UTF-8@valencia UTF-8
ca_IT.UTF-8 UTF-8
ca_FR.UTF-8 UTF-8
robert-ancell commented 3 years ago

@iainlane seems to be the primary author of this file?

ximion commented 3 years ago

Very odd... The @valencia modifier should be stripped together with the .UTF-8 and end up in an associative array, which can also have a key only once. But it looks like that doesn't happen, which means that it gets stripped at YAML serialization, see https://github.com/ximion/appstream/blob/master/src/as-yaml.c#L652 In hindsight, the value sanity check should likely have been done somewhere earlier in the chain to ensure that we end up with just one unique key in a GHashTable before that one is serialized as-is, but I guess this has been done for efficiency in the past. I do wonder though whether the result would be less surprising if libappstream also changed its behavior here, and it may not actually loose noticeable performance...

The easiest fix though is to not feed "bad" values in in the first place, which should be fixed in asgen (which may not necessarily be an issue limited to the Ubuntu backend).

iainlane commented 3 years ago

@ximion can you transfer this issue to appstream please?

This is fixed by something like https://github.com/iainlane/appstream/commit/a0246f128dc05396632f710152de926c3c379a87

Not proposed yet, because

  1. I want to write a testcase, and I don't know how to do that because the header for this as-utils-private.h has hidden visibility and the testsuite links the shared library, so the function can't be linked in. Would like some advice on how to do this best.
  2. The function is used in a dodgy way. Note that it modifies the string in place, and is called as the callback from g_hash_table_foreach. That is, we are directly modifying the keys in a hash table, breaking it. We can either change the API and return a copy (then probably use g_string_replace instead of this stuff), or consume the hash table, making as_component_emit_yaml invalidate the AsComponent which is passed to it. That would be a bit weird I think - the first option is probably the way to go. (Or we could modify that line I just linked to to g_strdup, but if all callers are going to strdup then we might as well make the function do it itself IMO).
iainlane commented 3 years ago

This should be fixed with snap revision 153 (ish) onwards, which is built using the new version of appstream, 0.14.4. Unfortunately, as release pockets are immutable, we have no way of fixing the broken YAML which is out there for hirsute. Fixing that issue would be #29, but it's not fully there yet for this purpose.

Thanks for reporting.