scala / scala-xml

The standard Scala XML library
Apache License 2.0
297 stars 92 forks source link

More secure parsing #17

Closed adriaanm closed 3 years ago

adriaanm commented 10 years ago

@jroper says to add the following to XMLLoader.parser:

See http://blog.csnc.ch/2012/08/secure-xml-parser-configuration/

try { 
  f.setFeature("http://xml.org/sax/features/external-general-entities", false);
  f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
} catch {
  case e: ParserConfigurationException => // warn that the SAXParserFactory supplied by the JDK doesn't support this feature, and that the application may therefore be vulnerable to external entity attacks, encourage to define your own parser instead
  case e: SAXNotRecognizedExcetpion => // as above
  case e: SaxNotSupportedException => // as above
}
milessabin commented 9 years ago

This IETF BCP is worth a read: https://www.ietf.org/rfc/rfc3470.txt

retronym commented 9 years ago

Thanks, @milessabin!

Quoting the juicy parts:

XML mechanisms that follow external references (Section 4.14) may also expose an implementation to various threats by causing the implementation to access external resources automatically. It is important to disallow arbitrary access to such external references within XML data from untrusted sources. Many XML grammars define constructs using URIs for external references; in such cases, the same precautions must be taken.

.

Acknowledgements The authors would like to thank the following people who have provided significant contributions to the development of this document:

Mark Baker, Tim Berners-Lee, Tim Bray, James Clark, ... , Miles Sabin ...

A blast from the past :)

jroper commented 9 years ago

Something not mentioned in the RFC is the memory issues that doctypes introduce, including the billion laughs vulnerability (recursive entity references leading to exponential memory usage) and the quadratic blowup vulnerability (many references to a single large entity leading to quadratic memory usage). Both of these vulnerabilities allow an attacker, with a small payload (as small as 100B for billion laughs, as small as 200KB for quadratic blowup) to cause a JVM to OOME. While the JDKs XML parser does have protection against billion laughs (recursion limits, but off by default), it doesn't have any protection against quadratic blowup. So the only safe way to handle untrusted XML on the JVM is to disallow doctypes altogether. It would be nice if the JDK XML parsers allowed you to simple ignore doctypes, but unfortunately, they don't, either it accepts the doctype, or fails if the doctype is present.

Disallowing doctypes is likely to cause some problems for users - many systems sending XML will automatically add a doctype, and most users will be frustrated that Scala doesn't accept this. So, we need to consider that if we decide to introduce this as a default. On Play, we do have users every so often report this issue, but we find that after explaining to them that if doctypes were allowed, the attacker could take down their webapp with a single request, they're happy to change the sending system to not include a doctype.

milessabin commented 9 years ago

Indeed ... I think Billion Laughs came a little later.

dnvriend commented 9 years ago

Last year I got a warning from a friendly hacker that warned me about this issue in JBoss Fuse 6.0.0 that we used then. We also use spray and accept XML. Other components also use the SecureXML. The xxeFilter is also used by other routes. The code I created then:

object SecureXml {
  def loadString(xml: String): NodeSeq = {
      val spf = SAXParserFactory.newInstance()
      spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
      spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
      val saxParser = spf.newSAXParser()
      XML.withSAXParser(saxParser).loadString(xml)
  }
}

The 'xxeFilter' is:

trait XxeFilter {
  def log: LoggingAdapter

  implicit def executionContext: ExecutionContext

  def xxeFilter(xxe: String): Future[String] = {
    log.info("Validating xml: [{}]", xxe)
    Future(xmlMustStartWith(xmlMustNotContain(xxe)))
  }

  def throwError(xxe: String, message: String): String = {
    log.error("{}:\n [{}]", message, xxe)
    throw new Error(message)
  }

  private def xmlMustNotContain(xxe: String): String = xxe match {
    case _ if xxe.toUpperCase.contains("<!DOCTYPE") => throwError(xxe, "XML contains DOCTYPE declaration")
    case _ if xxe.toUpperCase.contains("<!ENTITY") => throwError(xxe, "XML contains ENTITY declaration")
    case _ if xxe.toUpperCase.contains("<!ELEMENT") => throwError(xxe, "XML contains ELEMENT declaration")
    case _ if xxe.toUpperCase.contains("SYSTEM") => throwError(xxe, "XML contains SYSTEM declaration")
    case _ => xxe
  }

  private def checkForXXE(xxe: String): String = {
    SecureXml.loadString(xxe).toString()
    xxe
  }

  private def xmlDataMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("//removed - non open source//") && xxe.contains("""xmlns="//removed non open-source"""") => checkForXXE(xxe)
    // snip
    case _ => throwError(xxe, "received XML does not start with xml declaration or //snip // element or namespace is not correct.")
  }

  private def xmlMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("<?xml") => xmlDataMustStartWith(xxe.substring(xxe.indexOf("?>") + 2, xxe.size).trim)
    case _ => xmlDataMustStartWith(xxe)
  }
}

BasicRoute:

trait BasicRoute extends Directives {
  implicit def executionContext: ExecutionContext
  implicit def timeout: Timeout
  implicit def system: ActorSystem

  def badRequestXml = "//snip//"

  def internalServiceErrorXml(description: String, t: Option[Throwable]) = "//snip//"

  def validationError(code: String, description: String) = "//snip//"

  def completeWithBadRequest = complete(BadRequest, badRequestXml.toString())

  def completeWithError(throwable: Throwable) = complete(InternalServerError, internalServiceErrorXml(s"An error occurred: ${throwable.getMessage}", Option(throwable)))

  def completeWithError = complete(InternalServerError, internalServiceErrorXml(s"An error occurred", None))

  def completeWithValidationError(code: String, description: String) = complete(BadRequest, validationError(code, description))
}

use in route:

trait ARoute extends BasicRoute with XxeFilter {

  lazy val ARoute: Route =
    path("//snip//") {
      post {
        entity(as[String]) { xxe =>
          onComplete(xxeFilter(xxe)) {
            case Success(xml) =>
                camel(xml).map { _ =>
                  complete(StatusCodes.NoContent)
                }.recover { case t: Throwable =>
                  completeWithError(t)
                }.getOrElse {
                  complete(StatusCodes.InternalServerError)
                }
            case Failure(ex) => completeWithBadRequest
          }
        }
      }
    }

  def camel(xml: String): Try[Unit]
}
adriaanm commented 9 years ago

Thanks, @dnvriend! Maybe we could include this so that others can just call 'xml.beSafe'?

dnvriend commented 9 years ago

Or just be safe... implicitly :)

jawshooah commented 9 years ago

Any plans to implement this? I'd love it if scala-xml had XXE prevention bundled in.

biswanaths commented 9 years ago

Let me have a look at this.

EdgeCaseBerg commented 9 years ago

curious, has any work been done on this?

biswanaths commented 8 years ago

https://github.com/scala/scala-xml/blob/master/src/main/scala/scala/xml/factory/XMLLoader.scala#L28

Changing the parser features as suggest while creating the parser should be simple enough.

What should be the route to upgrade ? We can change the code but not the data. Anybody who upgrades to latest version there is a possibility that some data path might fail.

Please let me know, how we should proceed with this.

EdgeCaseBerg commented 8 years ago

If Option A: Then people who upgrade their code will see the warnings right away and know they should put some time into implementing the necessary changes.

If Option B: Then people who upgrade their code will be happily unaware of issues with any of their other software that might be at risk if using non-updated versions.

Option A seems to increase awareness of these type of issues in general which I'd count as a +1 to that approach, but I can see why many would like to have safety by default and would advocate for that change instead.

v6ak commented 8 years ago

I am against option B for some other major reason: It might increase fear of security updates (as it might break things), which is bad for security. What is worse, everything will compile after the change, unit tests will likely also pass, but the app still might break. (I know, integration tests should theoretically catch it, but still… Even in the case that integration tests catch the issue, it does not encourage that updates are painless.)

Maybe the only advantage of Option B is forcing old code to secure mode. Maybe this goal might be achieved with some tradeoff. For example, we would have three objects:

We might call this tradeoff as Option C.

Finally, there is an Option D. It is like Option C, but with scala.xml.Xml object completely removed. As a result, all programmers would be forced to rewrite the code using the legacy scala.xml.Xml. Or maybe all the methods of scala.xml.Xml object would be marked as synthetic and start throwing an exception.

Scenarios of upgrade:

  1. Programmer upgrades to a new version of the library. A: Programmer is warned and will hopefully take the action. B: Code will compile without warning, but might silently break. C: Programmer is warned and will hopefully take the action. D: Programmer is forced to take the action.
  2. Programmer upgrades another library that evicts the old version. The code is, however, compiled against the old version, maybe because it is part of a 3rd party library. A: Programmer is not warned. B: Code will compile without warning, but might silently break. C: Programmer is not warned, the code will not break (but might be insecure). D: The code will suddenly start throwing weird errors at runtime. (Grrrr!)
  3. Someone wants to force the secure behavior for all the application (with taking the risks of potential incompatibility). I suppose that there is the new XML library version in the project. A: No way. B: Automatically done. C: Can be done by setting a global system property. D: Forced manual action.

The Option A covers only one of these three scenarios.

The Option B covers all these three scenarios. The cost is, however, potential silent BC breaks and discouraging users from security updates.

Option C covers scenarios 1 and 3 and performs definitely better than Option A. While Option C seems to fail at the scenario 2, it is not as bad as it might sound. If the programmer also uses the XML library directly, the scenario 3 might be triggered.

The Option D might look like the safest option, because it forces taking some action, but it is not so simple. At the scenario 3, the programmer can't simply evict the old version of scala-xml. All the relevant libraries are required to be upgraded before, which can actually delay the fix. In such case, the Option D performs significantly worse than others.

In short term: I am in favour of Option C. The Option A looks the second best for me. Options B and D look wrong for me in the short term.

In long term: After the old object or methods being deprecated for some period of time, I'd use the Option D and remove the deprecated scala.xml.Xml object.

biswanaths commented 8 years ago

Anybody else has any opinion please let us know. I think we should try to get a satisfactory solution to this.

adrianbn commented 7 years ago

Just curious since this issue seems old and still open: what is the recommended way to disable insecure features with scala-xml? Maybe override XMLLoader.parser with an implementation that includes the following?

spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false)
spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
spf.setXIncludeAware(false)
spf.setExpandEntityReferences(false)

Thank you!

tyohDeveloper commented 5 years ago

The name of the "secure" option should be something like "no XSS, mostly secure*." We don't want to imply security that we can't implement. Hackers will holes in every implementation. Even if there isn't a hole in the library, some naïve developers will use unsafe practices. We shouldn't imply more than what can be delivered.

Good programming practices always put parsing in its own thread. Futures make that somewhat easy. Unsafe usages or unintended Xss should throw run time exceptions that will kill the process. Only that thread will die, the main thread won't be affected. It can help control DSS attacks as well.

A number of strategies need to be employed to control resource usage, naturally. If possible, well known public key, elliptical (or other) methods that are somewhat difficult to crack.

AES 256 or a version that doesn't have well known bugs. The usual, but not always available methods. All of which take more runtime resources. So some resource control options should be available.

Most of this would require a lot of work to implement. Perhaps even a complete rewrite. Hooks to allow the consumer to add those features might be possible with considerably less work. A library of monads could be made available incrementally. That would help with implementation time. The community could add those over time. It would require extensive testing to ensure they don't actually weaken security. The NSA has injected weaknesses in the Linux (and others) distributions a number of times. The known ones have been caught and addressed. Monads would make them far less obvious and harder to spot.

Most of the committers in the scala know to each other. The use base is small enough to be ignored. As a paranoid system architect, I have high standards. They are probably unreasonable.

LessSecure, XML, MoreSecure?

v6ak commented 5 years ago

First of all, this issue is not related to any implementation issue. It is related to design issue – when you pass some untrusted data to it, it can do something insecure that you don't expect. Yes, in theory, programmer should have read documentation of scala-xml and specification of XML itself, but when you look for XXE and XEE vulnerabilities, they might convince you that it is not always the case. You also can ask some programmers about awareness of such vulnerabilities, I guess it will be under 50 %. Technically, this issue is not a bug in scala-xml, it is a feature. But it is not a good feature…

So, there is a good reason to have a SecureXml object, that parses securely. If any insecure option is accidentally enabled in SecureXml, it can be considered as a bug and disabling such a feature can be considered as a sort of backward-compatible change.

On the other hand, having NoXeeXml, NoXxeXml and NoXxeNoXeeXml would be rather confusing. (The vulnerabilities are XXE and XEE, not XSS…) Programmers probably don't want to decide if XEE+XXE protections are enough. Naming things is difficult, but SecureXml looks like a good tradeoff between conciseness and descriptiveness…

If this confuses some programmer to feel like everything they do with that is secure, then I am sorry, this is incompetence we cannot fix so easily (and MostSecureXml will not help). Such programmers might think that HTTPS makes their webapp secure (who cares about CSRF, XSS, header injection, SQL injection, …) or have many other misconception. I see this is an issue, but I don't think we can/should try to solve it there. I am sorry.

On DSS (you probably mean DoS): Using a separate thread does not help much. If you have a thread pool of a limited size, the attacker might exhaust this pool this way. If you don't limit the size, attacker might easily force you to allocete most of the CPU time and much memory for XML parsing. You would need to have some time limit to reasonably reduce the impact of the the attack, but this is not so easy to implement. Actually, you don't have a good option at the moment:

tyohDeveloper commented 5 years ago

I wasn’t being clear. Posting from a phone is difficult. All of the datatyping of simple markup languages in general are cross site (or inner site) issues. I dump them all in the XSS bucket for shorthand. I’ll use XS forward.

The DoS (DDS was my phone helping me) attacks are more easily dealt with by delaying threads and processes. The attack can be identified, threads and process killed, and mitigation routines executed. Putting parsing in the main thread makes intercepting the problem problematic. It’s easier to just kill threads and processes. At that point, you don’t care about parsers. It’s a standard architectural pattern.

The main points are (1) the default path should catch and prevent any XS. Adding “secure” to the title will over promise the parser. It isn’t (nor can be) secure. Developers should be not misled on that point. (2) Having a class “more secure than not” in its title is, as you say, clumsy.

The design is intended to be open to XSS style attacks or it isn’t. The current design is not. The suggested design is. Having that design by default is a good thing. Just like not allowing malformed XML, which is already in place.

The implementation can not be safe. It will have bugs. The developer will always bollocks things up. Even with a relatively safe implementation. We can skip both of the issues for this discussion.

The answer to question (1) Yes, the default should skip over the most obvious problems. (2) I have no idea how to name them so they aren’t over promising and not clumsy.

David

On Sun, Jun 9, 2019 at 4:16 AM Vít Šesták notifications@github.com wrote:

First of all, this issue is not related to any implementation issue. It is related to design issue – when you pass some untrusted data to it, it can do something insecure that you don't expect. Yes, in theory, programmer should have read documentation of scala-xml and specification of XML itself, but when you look for XXE and XEE vulnerabilities, they might convince you that it is not always the case. You also can ask some programmers about awareness of such vulnerabilities, I guess it will be under 50 %. Technically, this issue is not a bug in scala-xml, it is a feature. But it is not a good feature…

So, there is a good reason to have a SecureXml object, that parses securely. If any insecure option is accidentally enabled in SecureXml, it can be considered as a bug and disabling such a feature can be considered as a sort of backward-compatible change.

On the other hand, having NoXeeXml, NoXxeXml and NoXxeNoXeeXml would be rather confusing. (The vulnerabilities are XXE and XEE, not XSS…) Programmers probably don't want to decide if XEE+XXE protections are enough. Naming things is difficult, but SecureXml looks like a good tradeoff between conciseness and descriptiveness…

If this confuses some programmer to feel like everything they do with that is secure, then I am sorry, this is incompetence we cannot fix so easily (and MostSecureXml will not help). Such programmers might think that HTTPS makes their webapp secure (who cares about CSRF, XSS, header injection, SQL injection, …) or have many other misconception. I see this is an issue, but I don't think we can/should try to solve it there. I am sorry.

On DSS (you probably mean DoS): Using a separate thread does not help much. If you have a thread pool of a limited size, the attacker might exhaust this pool this way. If you don't limit the size, attacker might easily force you to allocete most of the CPU time and much memory for XML parsing. You would need to have some time limit to reasonably reduce the impact of the the attack, but this is not so easy to implement. Actually, you don't have a good option at the moment:

  • You could add a time limit, if the API supported it. This is by no way easy to implement in scala-xml, as it wraps XML inplementation from JRE.
  • You can try to kill the thread. But killing a thread it generally troublesome, as it can leave some locks locked and some structures inconsistent. Maybe the API was removed or “implemented” by throw new NotImplementedException() or so in some JREs.
  • You can interrupt the Thread – if the implementation supported it. Again, this would have to be supported by JRE.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scala/scala-xml/issues/17?email_source=notifications&email_token=AMBRI5WQ6DDRT7V53AFHGQDPZTC6PA5CNFSM4AJ3KOI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIGTTY#issuecomment-500197839, or mute the thread https://github.com/notifications/unsubscribe-auth/AMBRI5X6WIZWNQIMPIS5OSLPZTC6PANCNFSM4AJ3KOIQ .

v6ak commented 5 years ago

Insecure => You don't want to pass untrusted input there.

Secure => It is intended to process untrusted input. If it is not, it is a bug.

I think this is clear and there are some precedents like TLS (or SSL), SSH, HTTPS, FTPS and SFTP. All of them give you just some promise that designers and implementors is intended to be secure in some way. If this is the same case. It neither guarantees security nor tries to be secure in other ways than transport (e.g., preventing malware distribution from a malicious server).

On separate threads: Still, if you have a thread dump, it can provide you the same information, as it usually contains stacktraces. It also does not help in any means against data disclosure through XXE.

Note that those bugs are not much cross-site bugs:

SethTisue commented 5 years ago

"safer"?

tyohDeveloper commented 5 years ago

Yes, safer.

On Mon, Jun 10, 2019 at 2:19 AM Seth Tisue notifications@github.com wrote:

"safer"?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scala/scala-xml/issues/17?email_source=notifications&email_token=AMBRI5VB6CBBJMJ6ONU5QVLPZX6ABA5CNFSM4AJ3KOI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXJD47A#issuecomment-500317820, or mute the thread https://github.com/notifications/unsubscribe-auth/AMBRI5T6OO4VXQ4P5HHUVC3PZX6ABANCNFSM4AJ3KOIQ .

SethTisue commented 3 years ago

fixed by #177