savonrb / savon

Heavy metal SOAP client
https://www.savonrb.com
MIT License
2.07k stars 616 forks source link

Savon breaks if XML contains the character `` inside a particular XML field #994

Closed luispcosta closed 7 months ago

luispcosta commented 1 year ago

Bug report

Current behavior:

If some WSDL endpoints returns XML with the following character inside, savon breaks with the following error:

Savon::InvalidResponseError: Unable to parse response body:

Steps to reproduce current behavior:

Since this was only happening when calling specific endpoints, I can't really write down replication steps. What I did was I changed the source code of the method call_with_logging inside savon/operation.rb to:

    def call_with_logging(request)
      @logger.log(request) do
        headers = {
          "cache-control"=>"private, max-age=0",
          "content-length"=>"527",
          "content-type"=>"text/xml; charset=utf-8",
          "server"=>"Microsoft-IIS/8.5",
          "x-aspnet-version"=>"2.0.50727",
          "x-powered-by"=>"ASP.NET",
          "date"=>"Mon, 08 May 2023 15:15:14 GMT"
        }
        body =  <<-TXT
          <?xml version=\"1.0\" encoding=\"utf-8\"?>
          <soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"
            xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
            xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\">
            <soap:Body>
              <GetAppointmentsResponse xmlns=\"http://some/xml_ns\">
                <AppointmentInfo>
                  <Notes>Im wrong: &#x1</Notes>
                </AppointmentInfo>
              </GetAppointmentsResponse>
            </soap:Body>
          </soap:Envelope>
        TXT
        HTTPI::Response.new(200, headers, body.strip.chomp)
      end
    end

And then I called our code that calls Savon.client(options).call(:endpoint, ...).

Is this enough to reproduce the behavior? Let me know if I should give more repro info.

Expected behavior:

I think, if possible, we should be able to tell savon to ignore specific characters from the XML document.

System information:

Additional Info:

I took a bit of a dive through the code and it seems that when the code reaches this step:

def hash
  @hash ||= nori.parse(xml)
end

The variable @hash simply contains:

{:notes=>"Im wrong: \n                \n              \n            \n          "}

Using the XML example above, but it should have the whole XML document

pcai commented 7 months ago

this might be fixed with the recent release of nori 2.7, can you confirm? specifically, I think &#x1; is treated as an invalid character which would previously throw. https://github.com/savonrb/nori/pull/72 changes the default behavior to scrub it instead.

pcai commented 7 months ago

please reopen and provide additional detail if this still reproduces when using savon with nori 2.7