rpm-software-management / librepo

A library providing C and Python (libcURL like) API for downloading packages and linux repository metadata in rpm-md format
http://rpm-software-management.github.io/librepo/
GNU Lesser General Public License v2.1
75 stars 91 forks source link

Escape the package path part of the URL when downloading (RhBug:1817130) #188

Closed lukash closed 4 years ago

lukash commented 4 years ago

The package name especially can contain characters that need to be URL-encoded. However, the relative URL is a path fragment containing slashes, which have to be left unescaped.

https://bugzilla.redhat.com/show_bug.cgi?id=1817130

Tests: https://github.com/rpm-software-management/ci-dnf-stack/pull/819

lukash commented 4 years ago

The plus sign is actually not reserved in the path part of a URL... therefore I'm not sure this should actually be merged. See more discussion in the bug.

lukash commented 4 years ago

Ok let's put it in anyway, to make DNF more compatible.

lukash commented 4 years ago

There seems to be failing python unit test: test_download_packages_without_handle, I think it expects unescaped path.

I always forget to run these, sorry. Turns out you can put the complete URL as the relative_url attribute of PackageTarget, at least through the API. I've got a feeling it's actually also possible to put it into the <location> tag in the repository metadata?

Doesn't make me feel great about this. Because the full URL will need to come encoded already (there's no way to encode the full URL, it needs to be encoded when it's composed from its parts). But, the local filename is extracted from the relative_url (right before I encode it in my patch). Which means if there are characters that had to be encoded in the URL, the local filename will be url-encoded as well, which can't work.

This would just result in a messy and inconsistent interface... what to do? Right now the interface is relatively clean if we say "we never encode the filename". But if we want to encode and make the interface clean we'd actually have to encode somewhere earlier (on librepo interface the URLs would have to already be encoded I think), and url-decode when extracting the filename from the URL... ugh.

(The python tests have to be run manually or as a part of building the rpm, we should probably consider running them as a part of make test)

They are part of make test under the ENABLE_PYTHON_TESTS=y switch... yeah I think the switch should be on by default? Though I do usually forget to run make test altogether :grin:

Also I was wondering whether it would be worth it to escape the base_url as well, because when used I think it prepends the relative_url and we would end up with half not escaped and half escaped which seems unexpected, even though I understand its primarily done for the package name part. What do you think?

Not sure what you mean here... Where would we escape the base_url? Maybe you mean something else, but you can't escape a whole URL after it's been assembled - you can't know which characters are meant to be escaped and which are the real ones... In case of base_url seems to me it has to be escaped already in the repo config or whereever else it's stored?

lukash commented 4 years ago

@kontura sorry this got a bit big :confused: To sum it up here:

kontura commented 4 years ago

LGTM, tests passed.