openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.07k stars 419 forks source link

Got core dump (SIGSEGV) calling `soft_tfidf_similarity_with_phrases_and_acronyms` #367

Open yyl opened 6 years ago

yyl commented 6 years ago

Hi!

I run a piece of code fine under mac, but failed in linux, thinking it might be some platform-specific issue.


My country is US


Here's how I'm using libpostal

I use it thru jpostal:

  def testDedupeOpt(): Unit = {
    val dedupeOpt = JPostalUtils.dedupeOpt(datadir)
    A.assertTrue(dedupeOpt.nonEmpty)

    A.assertEquals(
      DuplicateStatus.LIBPOSTAL_POSSIBLE_DUPLICATE_NEEDS_REVIEW,
      dedupeOpt.get.isStreetDupe("45th St", "W 45")
    )

    val tokens1 = Array[String]("test")
    val scores1 = Array[Double](1.0)
    val tokens2 = Array[String]("tast")
    val scores2 = Array[Double](1.0)
    A.assertEquals(
      0.85,
      dedupeOpt.get.isNameDupeFuzzy(tokens1, scores1, tokens2, scores2),
      0.001
    )
  }

JPostalUtils looks like this

object JPostalUtils {
  private def loadLib(libNameOpt: Option[String], libPathOpt: Option[String]): Option[Unit] = {
    // Load jni by searching java.library.path (preferred)
    libNameOpt
      .flatMap(libName => TryO(classOf[UnsatisfiedLinkError])(System.loadLibrary(libName)))
      // Load jni with exact path
      .orElse(libPathOpt.flatMap(libPath => TryO(classOf[UnsatisfiedLinkError])(System.load(libPath))))
  }
  def dedupeOpt(
    datadir: String,
    libNameOpt: Option[String] = Some("jpostal_dedupe"),
    libPathFallbackOpt: Option[String] = Some("/usr/lib/libjpostal_dedupe.so")
  ): Option[Dedupe] = {
    loadLib(libNameOpt, libPathFallbackOpt).flatMap(_ => {
      TryO.catchAll(Dedupe.getInstanceDataDir(datadir))
    })
  }
}

Here's what I did

I run the code above in a unit test.


Here's what I got

The unit test runs fine on mac. However, it throws core dump on linux (centOS):

..............#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff6f2a70dc1, pid=2646, tid=0x00007ff6f3bfe700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_161-b12) (build 1.8.0_161-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.161-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libjpostal_dedupe.so.0.0.0+0x79dc1]  soft_tfidf_similarity_with_phrases_and_acronyms+0x781
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Here's what I was expecting

On linux unit test should also run fine.


Notes

The jpostal is our own customized version forked here: https://github.com/foursquare/jpostal. Since we were able to run it on mac, also according to the log of core dump, it looks like it's because function soft_tfidf_similarity_with_phrases_and_acronyms has issue running in linux environment

iantabolt commented 6 years ago

I'm thinking we should try to repro this in a small C project to eliminate any possibility that our JNI bindings are to blame.

yyl commented 6 years ago

@iantabolt I will still try to repo it as mentioned if necessary, but actually just realized the log says The crash happened outside the Java Virtual Machine in native code., would that eliminate the possibility of issue being inside JNI?

batterseapower commented 6 years ago

You need a mutex in your client code because of #34 , could that be the problem?