moov-io / signedxml

pure go library for processing signed XML documents
MIT License
52 stars 44 forks source link

Namespace Propagation #39

Open frazeradam opened 1 year ago

frazeradam commented 1 year ago

I've been working on getting some code moved off of PHP and into Go, and in testing found different behavior from a known working PHP system. It appears that namespaces aren't getting propagated when taking the subset that is getting signed, as in the example in section 3.7 here https://www.w3.org/TR/2001/REC-xml-c14n-20010315

In our case the (extremely abbreviated) XML is:

<E:Envelope xmlns:E="http://schemas.xmlsoap.org/soap/envelope/">
    <E:Body id="Body">

where in the dump I've extracted from inside the PHP library I'm using the body gets converted to

<E:Body xmlns:E="http://schemas.xmlsoap.org/soap/envelope/" id="Body">

and then generates the correct DigestValue.

I believe the canonicalization algorithm requires that when taking a subset, it needs to look up the chain for any namespace declarations, and move them to the highest level tag(s) that first reference that namespace (or something like that, the c14n specs are painful).

frazeradam commented 1 year ago

I took a crack at this and made some progress, but I've got a few question on how the internals are organized that I'm blocked on.

Inside signatureData.getReferencedXML I have been able to propagate the namespace up, but it appears that making the changes there the code to normalize the order of the attributes must have already run so simply appending the propagated namespaces gets me close, but not quite there. Intentionally putting the namespace first does it, but if there are multiple namespaces that need ordering it wouldn't work, so I do need to make sure that things go through the attribute ordering code.

If you do have an idea on that last step, I'm happy to submit my changes as a PR. From what I can tell renderAttributes seems to be involved in all of this but doesn't seem to fit correctly with what I'm doing.

Mostly working code in signedxml.go for feedback as well if I'm causing any unintentional side effects that would break other systems:

func (s *signatureData) getReferencedXML(reference *etree.Element, inputDoc *etree.Document) (outputDoc *etree.Document, err error) {
    uri := reference.SelectAttrValue("URI", "")
    uri = strings.Replace(uri, "#", "", 1)
    // populate doc with the referenced xml from the Reference URI
    if uri == "" {
        outputDoc = inputDoc
    } else {
        refIDAttribute := "ID"
        if s.refIDAttribute != "" {
            refIDAttribute = s.refIDAttribute
        }
        path := fmt.Sprintf(".//[@%s='%s']", refIDAttribute, uri)
        e := inputDoc.FindElement(path)
        if e == nil {
            // SAML v1.1 Assertions use AssertionID
            path := fmt.Sprintf(".//[@AssertionID='%s']", uri)
            e = inputDoc.FindElement(path)
        }
        if e != nil {
            // Attempt namespace propagation
            if err := s.propagateNamespace(e, e, inputDoc); err != nil {
                return nil, err
            }
            // Turn the node into a full document we can work on
            outputDoc = etree.NewDocument()
            outputDoc.SetRoot(e.Copy())
        }
    }

    if outputDoc == nil {
        return nil, errors.New("signedxml: unable to find refereced xml")
    }

    return outputDoc, nil
}

func (s *signatureData) propagateNamespace(e *etree.Element, base *etree.Element, inputDoc *etree.Document) error {
    // We begin by trying to resolve any references for the top level element, then we can recuse trying to resolve things
    attr, err := findNamespace(e, base, inputDoc)
    if err != nil {
        return err
    }
    if attr != nil {
        // Apply it to our current element
        e.Attr = append(e.Attr, *attr)
        // We need to sort such that namespaces come first and I have no idea how to make that happen here
    }
    for _, elem := range e.ChildElements() {
        if err := s.propagateNamespace(elem, base, inputDoc); err != nil {
            return err
        }
    }
    return nil
}

// findNamespace looks to find the namespace for e, returning it if and only if it's inside inputDoc but outside base.
// If there is a reference to the namespace and it can't be found an error will be returned.
func findNamespace(e *etree.Element, base *etree.Element, inputDoc *etree.Document) (*etree.Attr, error) {
    xlmns := "xmlns:" + e.Space
    // If there isn't a namespace, we can skip this whole function
    if e.Space == "" {
        return nil, nil
    }
    // If the namespace's reference is in the element itself, we have nothing to do
    if e.SelectAttr(xlmns) != nil {
        return nil, nil
    }
    // Start searching up the chain for a match
    current := e
    for {
        current = current.Parent()
        if current == nil {
            return nil, fmt.Errorf("couldn't resolve namespace reference of %s:%s", e.Space, e.Tag)
        }
        if attr := current.SelectAttr(xlmns); attr != nil {
            // We found it, last thing to do is check and see if it's inside or outside our input doc
            if containsElement(base, current) {
                return nil, nil
            }
            return attr, nil
        }
    }
}

// containsElement returns true if e is base or is base has e as any of its child nodes recursively
func containsElement(base *etree.Element, e *etree.Element) bool {
    if base == e {
        return true
    }
    for _, child := range base.ChildElements() {
        if containsElement(child, e) {
            return true
        }
    }
    return false
}
frazeradam commented 1 year ago

@adamdecaf any advice on how to move forward here. I'm happy to do most of the work, I just need a pointer in the right direction.

adamdecaf commented 1 year ago

Could we sort attributes again after adding the namespace attribute? If you open a PR with your changes I'll take a look and help out.

frazeradam commented 1 year ago

Yes, I'm pretty sure that's the way to go, but where I got stuck. I'll get a PR opened tomorrow (the code is on a computer I don't have easy access to today).

frazeradam commented 1 year ago

@adamdecaf #41 has been entered with my code that mostly solves the issue (minus sorting of attributes). I was unable to find any place in the code that did that at a XML tree level and didn't just output strings that I could reuse. Feel free to make a change or point me in the right direction for that and I can make the change. Thanks for the help with this!

frazeradam commented 1 year ago

@adamdecaf have you had a chance to look at this?

adamdecaf commented 1 year ago

@adamdecaf have you had a chance to look at this?

A bit. I'm trying to recruit some help inside Moov to look at this more.

frazeradam commented 10 months ago

@adamdecaf assuming you're back at work, any chance I could get an update on this. We're not far from needing to do something, and I'm hoping to not need to keep around a PHP code base just for XML digital signatures.

adamdecaf commented 10 months ago

I'm sure we can figure something out.

adamdecaf commented 9 months ago

@frazeradam Do you have a more complete test for this? Or are your original snippets enough to verify this works?

Nevermind, We've got a commented out test of the 3.7 example.

adamdecaf commented 9 months ago

I've looked at this again this morning and am afraid I don't understand the xml specs well enough to help out. Have you found a solution with PHP in the mix?

I do see a lot of work on https://github.com/moov-io/signedxml/compare/master...daugminas:signedxml:master but that deviates from the project as it is now.

adamdecaf commented 9 months ago

I do see some success with daugminas/master, so it might be worthwhile to adopt changes from that.

Running the 3.7 example gets closer...

Failures:

  * /Users/adam/code/src/github.com/moov-io/signedxml/examples/canonicalization_test.go 
  Line 202:
  Expected: '<e1 xmlns="http://www.ietf.org" xmlns:w3c="http://www.w3.org"><e3 xmlns="" id="E3" xml:space="preserve"></e3></e1>'
  Actual:   '<doc xmlns="http://www.ietf.org">
     <e1>
        <e2 xmlns="">
           <e3 id="E3"></e3>
        </e2>
     </e1>
  </doc>'
  (Should be equal)
  Diff:     '<e1doc xmlns="http://www.ietf.org">
   xmlns:w3c="http://www.w3.org"  <e1>
        <e32 xmlns="">
           <e3 id="E3" xml:space="preserv></e"3>
        </e32>
     </e1>
  </doc>'
frazeradam commented 9 months ago

I'm currently at AWS re:Invent, but will take a look at that next week when I get back and have access to all of my test code. Thanks for digging into it more.

frazeradam commented 9 months ago

@adamdecaf sadly despite playing around with it for a bit, I've failed to even make it run. It seems like it uses a C library (what I was trying to avoid - and according to the readme for this repository, also something this package is trying to avoid). Specifically the CGo "fun" originates in this package https://github.com/lestrrat-go/libxml2/blob/master/clib/clib.go which is used in the c14n.go file that #44 proposes to add.

adamdecaf commented 8 months ago

What errors are you having trying to get CGO working? I've got a fair amount of experience cross compiling it and running inside docker images. We use CGO at Moov where needed, but try to avoid it.

frazeradam commented 8 months ago

CGo is a non-starter for us for this project (at least as part of the main executable - we may end up needing to break out into a second executable to handle the signed XML). We actually picked this package to work with since the README says:

Other packages that provide similar functionality rely on C libraries, which makes them difficult to run across platforms without significant configuration. signedxml is written in pure go, and can be easily used on any platform.

adamdecaf commented 8 months ago

Understood. I agree CGO is best to avoid and would try to keep it as isolated as possible. The project we're using this in already has CGO. Feel free to reach out if you need anymore help either in private or on Issues.

frazeradam commented 3 months ago

I figured I should let you know where we ended up with this. We found that github.com/ebitengine/purego can be used to interface with libxml2 without needing CGO and this will cover our use case.

adamdecaf commented 2 months ago

That sounds like a lot better solution. Do you have an example that you could share? We could adopt that within signedxml.

Edit: I've seen some people have success with https://github.com/nbio/xml as well