strongswan / swidGenerator

Application which generates SWID-Tags from Linux package managers like dpkg, rpm or pacman.
MIT License
16 stars 11 forks source link

swid_generator swid --hierarchic produces root Directory without @name #50

Closed adelton closed 5 years ago

adelton commented 6 years ago

Running

swid_generator swid --package bash --full --pretty --hierarchic | head

produces

<?xml version="1.0" encoding="utf-8"?>
<SoftwareIdentity name="bash" tagId="Fedora_30-x86_64-bash-4.4.23-4.fc30.x86_64" version="4.4.23-4.fc30.x86_64" versionScheme="alphanumeric" xml:lang="en-US" xmlns="http://standards.iso.org/iso/19770/-2/2015/schema.xsd" xmlns:SHA256="http://www.w3.org/2001/04/xmlenc#sha256" xmlns:n8060="http://csrc.nist.gov/ns/swid/2015-extensions/1.0">
  <Entity name="strongSwan Project" regid="strongswan.org" role="tagCreator"/>
  <Meta product="Fedora 30 x86_64"/>
  <Payload n8060:envVarPrefix="$" n8060:envVarSuffix="" n8060:pathSeparator="/">
    <Directory root="">
      <Directory name="etc">
        <Directory name="skel">
          <File SHA256:hash="2584c4ba8b0d2a52d94023f420b7e356a1b1a3f2291ad5eba06683d58c48570d" n8060:mutable="true" name=".bash_logout" size="18"/>
          <File SHA256:hash="28bc81aadfd6e6639675760dc11dddd4ed1fcbd08f423224a93c09802552b87e" n8060:mutable="true" name=".bash_profile" size="141"/>

The top level Directory element is <Directory root=""> without @name attribute.

However, http://standards.iso.org/iso/19770/-2/2015-current/schema.xsd defines the attribute as mandatory:

<xs:attribute name="name" type="xs:string" use="required"/>

The net effect is that swidval /dev/stdin reports

ERROR GEN-1-1: cvc-complex-type.4: Attribute 'name' must appear on element 'Directory'.

on the output and xmllint --schema http://standards.iso.org/iso/19770/-2/2015-current/schema.xsd - says

-:6: element Directory: Schemas validity error : Element '{http://standards.iso.org/iso/19770/-2/2015/schema.xsd}Directory': The attribute 'name' is required but missing.
- fails to validate
tobiasbrunner commented 6 years ago

Thanks for the report. What would you suggest as fix? An empty name attribute? Removing the element and just start with its children (all with root="", I guess)?

adelton commented 6 years ago

Having empty @name would match the schema formally but likely not its spirit.

Leaving out the top-level Directory would probably be the best way. The @root is not required, so you don't have to include it at all if its value would be empty.

In general though, both the --hierarchic and non-hierarchic outputs use the Directory elements as containers when they often should not. If the goal is to match exactly what rpm -ql outputs, when the parent directory is not owned by the package, it likely should not be listed. That's why output of https://github.com/adelton/rpm2swidtag on the same package will be

  <Payload n8060:pathSeparator="/" n8060:envVarPrefix="$" n8060:envVarSuffix="">
    <File size="18" name=".bash_logout" location="/etc/skel" sha256:hash="2584c4ba8b0d2a52d94023f420b7e356a1b1a3f2291ad5eba06683d58c48570d" n8060:mutable="true" key="true"/>
    <File size="141" name=".bash_profile" location="/etc/skel" sha256:hash="28bc81aadfd6e6639675760dc11dddd4ed1fcbd08f423224a93c09802552b87e" n8060:mutable="true" key="true"/>
    <File size="312" name=".bashrc" location="/etc/skel" sha256:hash="30a80bfce3d108d6878cf13dfb1f3a1ea15b141dbdc5bc5803f4ab40a2a39f9c" n8060:mutable="true" key="true"/>

and Directory will only be included in the SWID tag when the directory is owned by the package:

    <Directory name=".build-id" location="/usr/lib">
      <Directory name="48">
        <File size="24" name="04c654c67d1555a98bb75a576cc00a88504bce" key="true"/>
      </Directory>
    </Directory>
    <Directory name="bash" location="/usr/share/doc">
      <File size="99966" name="FAQ" sha256:hash="24a49bf217b350e9c27dc8bcc1317bbad2c2a0d147e0bca72e09dc295b007124"/>
      <File size="7072" name="INTRO" sha256:hash="88d4f438af6af108d5970aed3a88208a295b6dafc2d55b549169bb0a372ec1fb"/>
      <File size="1693" name="RBASH" sha256:hash="be24c8b82aef28c89b11d1276a0cae1792f29802b847cd5f8b16481b7785d7b2"/>
      <File size="1084" name="README" sha256:hash="255cce4868c3fd2887c050fc4e56edc3a246e3dfedd8a58c16e57eaeaf43bee5"/>
      <File size="342306" name="bash.html" sha256:hash="685b744dd3aca1280935972160c523a1dbb55a6b3872f7d554b5cc7e1cb13f22"/>
      <File size="774491" name="bashref.html" sha256:hash="c6e6a560dab921011c6dd746f7403605c9bd0e0b32dab4cc8177a7aa67dee86f"/>
    </Directory>
adelton commented 6 years ago

It just occurred to me, the fact that swid_generator wraps files into Directory elements might be the reason why it fails to list any Files for .src.rpms. In source rpms, the files are not within any directories and that might be the reason why swid_generator does not output them.

tobiasbrunner commented 6 years ago

Having empty @name would match the schema formally but likely not its spirit.

Maybe name="/" then? Because if --hierarchic is omitted root="/" is used for e.g. the Directory tag of /etc. But maybe that would result in a double slash at the beginning of the path in strongTNC (/ + the path separator, which is also /). So I guess omitting the root Directory tag and the attribute makes more sense.

Leaving out the top-level Directory would probably be the best way. The @root is not required, so you don't have to include it at all if its value would be empty.

OK.

In general though, both the --hierarchic and non-hierarchic outputs use the Directory elements as containers when they often should not.

I guess the idea was to avoid redundant location attributes for the File tags (or maybe it was just how strongTNC processed the tags, I don't know).

If the goal is to match exactly what rpm -ql outputs, when the parent directory is not owned by the package, it likely should not be listed.

I don't think that was the goal. It just collects all the files installed by a package for use with strongTNC in a relatively compact way. But if the standard actually defines the Directory/File tags to mean responsibility/possession that should maybe be changed or made an option.

In source rpms, the files are not within any directories and that might be the reason why swid_generator does not output them.

What do you mean? Where are they stored then? And I think the tool was mainly intended to work with binary packages anyway (executables are the main concern).

adelton commented 6 years ago

I guess the idea was to avoid redundant location attributes for the File tags (or maybe it was just how strongTNC processed the tags, I don't know).

Understood.

I don't think that was the goal. It just collects all the files installed by a package for use with strongTNC in a relatively compact way. But if the standard actually defines the Directory/File tags to mean responsibility/possession that should maybe be changed or made an option.

The XML schema comment says "Provides the ability to apply a directory structure to the files defined in a Payload or Evidence element" about the base="FilesystemItem" and then "A Directory element allows one or more directories to be defined in the file structure" about the name="Directory" type="Directory". So I guess it can be read both ways.

In source rpms, the files are not within any directories and that might be the reason why swid_generator does not output them.

What do you mean? Where are they stored then?

The are just file names, without any directory / path in from of them.

tobiasbrunner commented 6 years ago

What's strange is that the example in the ISO standard (page 29) shows a Directory tag with a root attribute but no name attribute (as an explicit alternative to a version in which the top-level tag has both). Maybe that's just wrong (I only have the original version of the standard), or the schema is too restrictive (but I guess if it was optional that would allow File tags without name attribute, which doesn't make sense).

If we remove the outmost Directory tag the question is how we would handle files in the root directory (not that that's very common, and the current code doesn't handle that correctly either I think). I guess we could add a root attribute to that File tag (while that might seem a bit strange, I don't think it violates the standard or schema as both tags are derived from FileSystemItem), or we set the location attribute to "/" (although, the tool was explicitly refactored once to not use that attribute anymore).

Another question is whether the hierarchic option really makes that much sense. It seems pretty redundant to me and might not even save any space depending on the structure of the package (e.g. for the bash package on Ubuntu 16.04 the hierarchic version requires 51 characters more than the flat version, and even 321 characters more with --pretty).

Anyway, I've pushed a commit to the 50-hierarchic-root branch that removes the enclosing tag and adds a root="/" attribute to the first-level Directory tags if the path is absolute (same for files in the root dir). The order of the child tags is different with the patch, though. It's now strictly alphabetical, the previous code added files before subdirs (I don't think that really matters).

I also noticed that the handling of relative paths is a bit strange for the flat file generator (again, something that's probably not very common, but interestingly the unit tests use relative paths). It adds a root attribute with the relative path to the directory in which the file is located, that seems wrong to me as that's never the absolute root path of that directory (a location attribute would probably better fit that). But since that really doesn't seem very common I didn't change it.

In source rpms, the files are not within any directories and that might be the reason why swid_generator does not output them.

What do you mean? Where are they stored then?

The are just file names, without any directory / path in from of them.

I see what you mean. I guess that's because they are unpacked relative to the current directory (at least if they work the same way DEB source packages work). As mentioned above, relative paths are not handled well by the code. And when filtering files in packages there is actually an explicit check whether the path starts with a slash:

https://github.com/strongswan/swidGenerator/blob/0d0941d261925cc01638e88df748c0f2c4395131/swid_generator/environments/common.py#L58-L59

As I already mentioned, for the current use cases source packages are not really relevant. So the question is what you would expect the tool to do with a source package?

adelton commented 6 years ago

Well, the question is whether swidGenerator plans to be a generic SWID tag generating tool, or more-or-less tied to the strongTNC use case. If the second, then maybe failing when the package is a source package might be a valid behaviour, for example.

tobiasbrunner commented 6 years ago

Is there a use case for SWID tags for source packages? And what paths would you expect in the tag? Just the relative paths? (According to the standard that would mean relative to the SWID tag, but would that actually be the case?)