ua-parser / uap-scala

Scala port of ua-parser
Do What The F*ck You Want To Public License
98 stars 49 forks source link

Experienced a memory leak #31

Open dgohlke opened 6 years ago

dgohlke commented 6 years ago

I'm using uap-scala in a mobile backend that services 4.5 million requests a day. While recently debugging a memory leak, I discovered snakeyaml consuming an excessive amount of heap memory. org.yaml.snakeyaml.error.Mark was consuming 115 MB with nearly 3 million instances. Using https://github.com/jrudolph/sbt-dependency-graph, I found uap-scala was the only dependency in our build that was dependent on snakeyaml.

We have a playframework application setup with a logging filter similar to this (simplified):

import akka.util.ByteString
import javax.inject.Inject
import org.uaparser.scala.{Client,Parser}
import play.api.Logger
import play.api.libs.streams.Accumulator
import play.api.mvc._
import scala.concurrent.{ExecutionContext, Future}

class LoggingFilter @Inject() (implicit ec: ExecutionContext) extends EssentialFilter {

  def apply(nextFilter: EssentialAction): EssentialAction = new EssentialAction {
    def apply(request: RequestHeader): Accumulator[ByteString, Result] = {
      nextFilter(request).map { result =>
        logData(request, result)
        result
      }
    }
  }

  def logData(request: RequestHeader, result: Result): Future[Unit] = {
    Future {
      val client: Client = Parser.default.parse(request.headers.get("User-Agent").get)
      Logger.info(s"User Agent: ${client.userAgent}, Device: ${client.device}")
    }
  }
}

The above problem exists with the above code. Here's what we did to fix it:

class LoggingFilter @Inject() (implicit ec: ExecutionContext) extends EssentialFilter {
  val uaParser: Parser = Parser.default

  def apply(nextFilter: EssentialAction): EssentialAction = new EssentialAction {
    def apply(request: RequestHeader): Accumulator[ByteString, Result] = {
      nextFilter(request).map { result =>
        logData(request, result)
        result
      }
    }
  }

  def logData(request: RequestHeader, result: Result): Future[Unit] = {
    Future {
      val client: Client = uaParser.parse(request.headers.get("User-Agent").get)
      Logger.info(s"User Agent: ${client.userAgent}, Device: ${client.device}")
    }
  }
}

We moved the Parser.default instantiation up into the root of the class. Now we no longer have the memory leak.

I'm wondering if either the documentation can be updated to suggest following a pattern of instantiating then parsing, or if this can be handled from a code perspective. In fact, I believe this change might address the problem:

 object Parser {
+  val yaml = new Yaml(new SafeConstructor)
   def fromInputStream(source: InputStream): Try[Parser] = Try {
-    val yaml = new Yaml(new SafeConstructor)
     val javaConfig = yaml.load(source).asInstanceOf[JMap[String, JList[JMap[String, String]]]]
     val config = javaConfig.asScala.toMap.mapValues(_.asScala.toList.map(_.asScala.toMap.filterNot {
       case (_ , value) => value eq null

Thoughts? I tried to submit this via PR, but i lack the permission in your repo.

travisbrown commented 6 years ago

Yes, an update to the documentation would be much appreciated! Are you able to open a PR from a personal fork?

travisbrown commented 6 years ago

(Oh, and about the position of the Yaml instantiation: my understanding was that Yaml is not thread-safe and that it's therefore not appropriate for a val in an object. This may have changed since the last time I looked at the SnakeYAML code / docs in detail, but I doubt it.)

asomov commented 5 years ago

Yaml is not thread-safe means that you cannot call methods of the same instance from different threads. I do not see how it might be connected to val