theupdateframework / specification

The Update Framework specification
https://theupdateframework.github.io/specification/
Other
368 stars 54 forks source link

Clarify delegation paths pattern matching wrt path separator #173

Closed jku closed 3 years ago

jku commented 3 years ago

This is about the pattern matching done on "paths" attribute of targets delegation. The language in the spec seems to contradict the reference implementation but neither is very clear.

I would like to see a mention of path separator "/" in the text: is it special cased WRT pattern matching or not.

Specification:

PATHPATTERN can include shell-style wildcards and supports the Unix filename pattern matching convention. Its format may either indicate a path to a single file, or to multiple paths with the use of shell-style wildcards.

I've been reading this spec for a year now and this has so far seemed clear to me: Unix filename pattern matching convention clearly means a glob where the path separator character never matches the wildcards. this is further emphasized by mention of shell-style wildcards: it's clear that the paths separator never matches in the shell.

Based on the above it seems clear that *.tgz should not match targets/foo.tgz -- this is how shell pattern matching works.

Examples in specification:

For example, the path pattern "targets/*.tgz" would match file paths "targets/foo.tgz" and "targets/bar.tgz", but not "targets/foo.txt". Likewise, path pattern "foo-version-?.tgz" matches "foo-version-2.tgz" and "foo-version-a.tgz", but not "foo-version-alpha.tgz"

The examples do not cover the interesting case: does *.tgz match targets/foo.tgz?

The reference implementation:

This is a simplified version, see https://github.com/theupdateframework/tuf/blob/develop/tuf/client/updater.py#L2868 for real code

if fnmatch.fnmatch("targets/foo.tgz", "*.tgz"):
    # path pattern matches

fnmatch documentation explains that the filename separator ('/' on Unix) is not special to this module. This means that in the reference implementation *.tgz does match targets/foo.tgz -- this is not what I expected from reading the spec.

mnm678 commented 3 years ago

My understanding of the spec's intention is that *.tgz should not match targets/foo.tgz. I don't think a wildcard delegation to multiple namespaces would make sense, especially as different namespaces would be controlled by different people/organizations.

Either way, this is a great example that we should add to the spec to make the intended behavior more clear.

joshuagl commented 3 years ago

Thanks for the detailed issue @jku!

My interpretation of the spec matches both of yours, that this is shell pattern matching and therefore that a PATHPATTERN of *.tgz should not match targets/foo.tgz.

I think this is a reference implementation bug (and that the reference implementation should probably be using glob, not fnmatch).

The Tough developers seem to have made the same interpretation of the spec and are using glob to implement PATHPATTERN matching.