pmonks / lice-comb

A Clojure library for software license detection.
Apache License 2.0
7 stars 0 forks source link

Support HTML in file and URL matching #48

Closed pmonks closed 3 months ago

pmonks commented 4 months ago

When URL matching retrieves the content of a URL in order to perform full text license matching, it would be ideal if text/html responses were supported (i.e. by converting the HTML to text/plain, and then performing license text matching on that).

pmonks commented 3 months ago

This should be trivial to do, given that JSoup is already a dependency. e.g.

(defn html-to-text
  [^String s]
  (when s
    (.text (org.jsoup.Jsoup/parse s))))