spdx / Spdx-Java-Library

Java library which implements the Java object model for SPDX and provides useful helper functions
Apache License 2.0
32 stars 33 forks source link

StringIndexOutOfBoundsException exception from org.spdx.utility.compare.TemplateRegexMatcher.findTemplateWithinText() #242

Open pmonks opened 3 weeks ago

pmonks commented 3 weeks ago

When calling org.spdx.utility.compare.LicenseCompareHelper.matchingStandardLicenseIdsWithinText(), I'm getting a StringIndexOutOfBoundsException thrown from within org.spdx.utility.compare.TemplateRegexMatcher.findTemplateWithinText() because the start index is larger than the end index. At this time I'm not sure exactly what the values of the arguments are to that method that are causing the issue, but I'll start trying to track that down next. This used to work with SPDX License List v3.23, so it's possible this is related to something that changed in SPDX License List v3.24.0.

The full exception stack trace is as follows, and note that the formatting may look unfamiliar as Clojure has its own way of printing exception stack traces:

{:clojure.main/message
 "Execution error (StringIndexOutOfBoundsException) at jdk.internal.util.Preconditions$1/apply (Preconditions.java:55).\nRange [18134, 17912) out of bounds for length 29880\n",
 :clojure.main/triage
 {:clojure.error/class java.lang.StringIndexOutOfBoundsException,
  :clojure.error/line 55,
  :clojure.error/cause
  "Range [18134, 17912) out of bounds for length 29880",
  :clojure.error/symbol jdk.internal.util.Preconditions$1/apply,
  :clojure.error/source "Preconditions.java",
  :clojure.error/phase :execution},
 :clojure.main/trace
 {:via
  [{:type java.util.concurrent.ExecutionException,
    :message "java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: Range [18134, 17912) out of bounds for length 29880",
    :at [java.util.concurrent.FutureTask report "FutureTask.java" 122]}
   {:type java.util.concurrent.ExecutionException,
    :message "java.lang.StringIndexOutOfBoundsException: Range [18134, 17912) out of bounds for length 29880",
    :at [java.util.concurrent.FutureTask report "FutureTask.java" 122]}
   {:type java.lang.StringIndexOutOfBoundsException,
    :message "Range [18134, 17912) out of bounds for length 29880",
    :at [jdk.internal.util.Preconditions$1 apply "Preconditions.java" 55]}],
  :trace
  [[jdk.internal.util.Preconditions$1 apply "Preconditions.java" 55]
   [jdk.internal.util.Preconditions$1 apply "Preconditions.java" 52]
   [jdk.internal.util.Preconditions$4 apply "Preconditions.java" 213]
   [jdk.internal.util.Preconditions$4 apply "Preconditions.java" 210]
   [jdk.internal.util.Preconditions outOfBounds "Preconditions.java" 98]
   [jdk.internal.util.Preconditions outOfBoundsCheckFromToIndex "Preconditions.java" 112]
   [jdk.internal.util.Preconditions checkFromToIndex "Preconditions.java" 349]
   [java.lang.String checkBoundsBeginEnd "String.java" 4865]
   [java.lang.String substring "String.java" 2834]
   [org.spdx.utility.compare.TemplateRegexMatcher findTemplateWithinText "TemplateRegexMatcher.java" 322]
   [org.spdx.utility.compare.TemplateRegexMatcher isTemplateMatchWithinText "TemplateRegexMatcher.java" 278]
   [org.spdx.utility.compare.LicenseCompareHelper isStandardLicenseWithinText "LicenseCompareHelper.java" 890]
   [org.spdx.utility.compare.LicenseCompareHelper matchingStandardLicenseIdsWithinText "LicenseCompareHelper.java" 959]
   [spdx.matching$licenses_within_text invokeStatic "matching.clj" 84]
   [spdx.matching$licenses_within_text invoke "matching.clj" 71]
   [lice_comb.impl.matching$eval23925$fn__23926$fn__23927 invoke "matching.clj" 136]
   [embroidery.api$pmap_STAR_$fn$reify__22819 call "vthreads.clj" 36]
   [java.util.concurrent.FutureTask run "FutureTask.java" 317]
   [java.lang.VirtualThread run "VirtualThread.java" 309]],
  :cause "Range [18134, 17912) out of bounds for length 29880"}}

Execution error (StringIndexOutOfBoundsException) at jdk.internal.util.Preconditions$1/apply (Preconditions.java:55).
Range [18134, 17912) out of bounds for length 29880

This was reproduced with Java v21.0.3+9, Spdx-Java-Library v1.1.11, and SPDX License List v3.24.0.

goneall commented 3 weeks ago

I did a quick look at the code and the out of bound is likely caused by an a substring range where the end is before the start.

I can see a corner case where this can happen if the pattern searching for the end of the license happens to also match the pattern before the start of the license.

We could add a check for this and look for a following match if this occurs.

The code to be updated is in the findTemplateInText method.