ngrunwald / ring-middleware-format

Ring middleware for parsing parameters and emitting responses in JSON or other formats
163 stars 49 forks source link

Drop icu4j dependency #74

Closed jimpil closed 5 years ago

jimpil commented 5 years ago

Hi there, icu4j may be a great library (i personally have no idea), but the cost to benefit ratio for this particular project is too high. Apart from being a rather large dependency (discussed in a separate issue), it is also somewhat vulnerable. In fact, the latest version (63.1) has a vulnerability score of 7.5 (high)! Version 58.2 has a score of 4.3 (medium), and so it is, in theory, usable in production, but still, what is the benefit of it really? Non-deterministic charset-detection? Is this one class worth depending on a 10MB vulnerable jar? I would think no, since applications either will provide a specific charset, or they will default to UTF-8. Therefore, i think icu4j is a complete overkill for this project, not to mention potential performance issues with it. Nobody would complain if ring-middleware-format simply defaulted to UTF-8 (in the absence of a specified charset). Additionally, since the class is imported in an :import clause on the ns declaration, one cannot exclude icu4j from his build. It may be worth exploring a situation where callers would provide the icu4j jar explicitly (if they want to). instaparse & duratom are two libraries I can think of the top of my head that do this. In short, any requires or imports of external libs that the caller can provide, are done manually underneath the ns declaration, within a try-catch expression, and if the lib is not on the classpath (caller's choice), they are stubbed out. Personally, i would just prefer the simplest approach of just dropping the icu4j dep. I think that would make everybody happy - I'm just throwing that other approach on the table, because it offers a migration path for those who do want icu4j in their stack.

https://www.cvedetails.com/cve/CVE-2018-18928/

Any thoughts? Kind regards,

Deraen commented 5 years ago

This has been previously investigated: https://github.com/ngrunwald/ring-middleware-format/issues/50

Removing charset detection would be breaking change and I do no have interest to develop Ring-middleware-format further. As mentioned in readme, I now recommend using Muuntaja which doesn't use Icu4j.

jimpil commented 5 years ago

Ok, that's fair enough...

Would you be open to a small PR (and a subsequent 0.7.4 release) which will require consumers to bring in the icu4j dependency explicitly (if they so desire)? I can even take care of the README, adding a little paragraph dedicated to icu4j, and how to bring it in post-0.7.3. That wouldn't break anyone depending on the charset-detection feature, right?

Alternatively, I guess I can always fork the project, and make a release of my own, but I'd like to avoid that if possible. Thanks again for your time...

Kind regards

Deraen commented 5 years ago

Dropping the dependency by default would change the way how wrap-format-params works when middleware :charset option or request charset are not set, i.e. what charset guessing is used. But I think we can add :guess-charset? option which defaults to true (to match current behavior), and the load the icu during middleware initialization when this option is true. Then users can exclude icu dependency and set the option to false.

Deraen commented 5 years ago

I should have a PR up soon, I just need to find a way to create Lein profile without icu dependency for testing.

jimpil commented 5 years ago

Hi again,

What you describe is probably a more rounded solution, but I was thinking of something much less involving like below (scratch-code):

(try 
  (import com.ibm.icu.text.CharsetDetector)
  (defn ^:no-doc guess-charset 
      ...) ;; the impl found in format_params.clj
(catch ClassNotFoundException _ 
  (def ^:no-doc guess-charset 
    (constantly nil))))

At this point the default behaviour hasn't changed, provided the consumer has brought in the icu4j compile dependency himself. If he hasn't, then the guess-charset fn is still there and it still returns something (nil) that its single consumer (get-or-guess-charset) can work with (eventually defaulting to utf-8).

In any case, thanks for taking the time to look at this.

danielsz commented 5 years ago

I appreciate the efforts to avoid breaking changes while dropping the icu4j dependency, so I hate to be the one to report the bad news that the change has broken my application in production. After upgrading r-m-f to 0.74, I see in the logs:

2019-04-29_12:44:16.83617 Caused by: java.lang.ClassNotFoundException: com.ibm.icu.text.CharsetDetector
2019-04-29_12:44:16.83624       at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
2019-04-29_12:44:16.83630       at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
2019-04-29_12:44:16.83636       at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
2019-04-29_12:44:16.83641       at java.base/java.lang.Class.forName0(Native Method)
2019-04-29_12:44:16.83646       at java.base/java.lang.Class.forName(Class.java:398)
2019-04-29_12:44:16.83651       at clojure.lang.RT.classForName(RT.java:2207)
2019-04-29_12:44:16.83656       at clojure.lang.RT.classForNameNonLoading(RT.java:2220)
2019-04-29_12:44:16.83663       at ring.middleware.format_params$loading__6706__auto____34822.invoke(format_params.clj:1)
2019-04-29_12:44:16.83668       at ring.middleware.format_params__init.load(Unknown Source)
2019-04-29_12:44:16.83673       at ring.middleware.format_params__init.<clinit>(Unknown Source)
2019-04-29_12:44:16.83680       ... 98 more

In development things went fine. Maybe it has something to do with how namespaces are being loaded in an uberjar. Not really sure. I'll take your advice and migrate to Muuntaja anyway.

Deraen commented 5 years ago

@danielsz That's strange. ring.middleware.format_params namespace definitely shouldn't use the class.

If you are using AOT, could you try running lein clean or similar and building new uberjar? First thing I'd check in such cases would be to ensure old class files didn't get included in the jar.

danielsz commented 5 years ago

Yours is the most likely explanation. I normally do myvn clean deploy but it's possible that I forgot to clean this time. Apologies. I was so startled to see my executable fail that my thinking was shrouded in fog. I appreciate your efforts. In the meantime, I did take your advice and migrated to muuntaja which really is a drop-in replacement after I figured out that wrap-format is not sufficient alone, you also need wrap-params. Thanks!