linkedin / URL-Detector

A Java library to detect and normalize URLs in text
783 stars 186 forks source link

Long run of periods causes detect() to throw NegativeArraySizeException "Backtracked max amount of characters. Endless loop detected." #15

Open davidthemarsh opened 7 years ago

davidthemarsh commented 7 years ago

String text = ".............:::::::::::;;;;;;;;;;;;;;;::...............................................:::::::::::::::::::::::::::::...................."; UrlDetector d = new UrlDetector(text, UrlDetectorOptions.Default); d.detect();

Running this will throw Exception in thread "main" java.lang.NegativeArraySizeException: Backtracked max amount of characters. Endless loop detected. Bad Text: ':...............................................:::::::::::::::::::::::::::::....................' at com.linkedin.urls.detection.InputTextReader.checkBacktrackLoop(InputTextReader.java:144) at com.linkedin.urls.detection.InputTextReader.seek(InputTextReader.java:120) at com.linkedin.urls.detection.UrlDetector.readUserPass(UrlDetector.java:511) at com.linkedin.urls.detection.UrlDetector.readScheme(UrlDetector.java:458) at com.linkedin.urls.detection.UrlDetector.processColon(UrlDetector.java:293) at com.linkedin.urls.detection.UrlDetector.readDefault(UrlDetector.java:253) at com.linkedin.urls.detection.UrlDetector.detect(UrlDetector.java:142) at Main.main(Main.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

kodjobaah commented 7 years ago

I am also having the same issue...here is my test code

 import scala.collection.JavaConversions._

  val value =
    """
      |
      |:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
      |
      |
    """.stripMargin
  val parser: UrlDetector = new UrlDetector(value, UrlDetectorOptions.JAVASCRIPT)
  val detectedUrls = parser.detect()

  detectedUrls.map(x => println(x.getOriginalUrl))

Below is the stack trace..

Exception in thread "main" java.lang.NegativeArraySizeException: Backtracked max amount of characters. Endless loop detected. Bad Text: '::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
'
    at com.linkedin.urls.detection.InputTextReader.checkBacktrackLoop(InputTextReader.java:144)
    at com.linkedin.urls.detection.InputTextReader.seek(InputTextReader.java:120)
    at com.linkedin.urls.detection.UrlDetector.readUserPass(UrlDetector.java:511)
    at com.linkedin.urls.detection.UrlDetector.readScheme(UrlDetector.java:458)
    at com.linkedin.urls.detection.UrlDetector.processColon(UrlDetector.java:293)
    at com.linkedin.urls.detection.UrlDetector.readDefault(UrlDetector.java:253)
    at com.linkedin.urls.detection.UrlDetector.detect(UrlDetector.java:142)
    at MeMe.sugar$.delayedEndpoint$MeMe$sugar$1(scala.scala:19)
    at MeMe.sugar$delayedInit$body.apply(scala.scala:8)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at MeMe.sugar$.main(scala.scala:8)
    at MeMe.sugar.main(scala.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Dasiu commented 4 years ago

Beware, yet another case, which leads to endless loop:

IPv6 = "http://[fe30:4:3:0:192.3.2.1]/"
new UrlDetector(IPv6, UrlDetectorOptions.ALLOW_SINGLE_LEVEL_DOMAIN).detect();