rpm-software-management / createrepo_c

C implementation of the createrepo.
http://rpm-software-management.github.io/createrepo_c
GNU General Public License v2.0
99 stars 94 forks source link

createrepo_c silently parses bad metadata if the repository contains the same package (same pkgid, same NEVRA) multiple times #306

Open dralley opened 2 years ago

dralley commented 2 years ago

The parsing API used by createrepo_c (the examples at least) is prone to parsing incorrect metadata in some scenarios where the metadata being parsed is, itself incorrect. But in this case the parsed metadata is incorrect in a different way.

If pkgcb inserts a package object into a dictionary / hashmap type keyed by the pkgid, then the second occurrence of duplicate pkgid replaces the initial one.

But after that happens, newpkgcb is used to with the other metadata files, and it gets the package from the dictionary / hashmap and adds files and changelog metadata to it. The end result is that the dictionary / hashmap will contain only one of the original two packages listed in the metadata, but that package will contain extra copies of every file and changelog.

As described here this would also likely be true even with the parse_main_metadata_together() API.

This issue is compounded because createrepo_c doesn't prevent such incorrect repositories from being created: https://github.com/rpm-software-management/createrepo_c/issues/307

dralley commented 2 years ago

The PackageIterator API allows you to avoid this and the old locate_and_load_xml() API allows you to configure what the behavior should be. The others are susceptible.