kachayev / nasus

Zero-configuration command-line async HTTP files server in Clojure. Like Python's SimpleHTTPServer but scalable.
MIT License
166 stars 9 forks source link

Wrong Content-Type for HTML5 documents - text/sgml #1

Closed vgamula closed 5 years ago

vgamula commented 5 years ago

In HTML5 documents doctype does not refer to any DTDs, usually specified like this:

<!DOCTYPE html>

Seems like the tibrary, that is used for detecting MIME types, does not deal properly with such definitions, they are parsed as text/sgml https://github.com/arimus/jmimemagic/issues/27.

Example:

> cat test.html
<!DOCTYPE html>
<html>
<head>
    <title>test</title>
</head>
<body>
test
</body>
</html>

> clj -Sdeps '{:deps {net.sf.jmimemagic/jmimemagic {:mvn/version "0.1.5"}}}' -r
Clojure 1.9.0
user=> (require '[clojure.java.io :as io])
nil
user=> (import '(net.sf.jmimemagic Magic))
net.sf.jmimemagic.Magic
user=> (def f (io/file (str (System/getProperty "user.dir") "/test.html")))
#'user/f
user=> (.getMimeType (Magic/getMagicMatch f true))
log4j:WARN No appenders could be found for logger (net.sf.jmimemagic.Magic).
log4j:WARN Please initialize the log4j system properly.
"text/sgml"

So, it is not an issue with Nasus itself. Server just sends wrong Content-Type and browsers cannot render it properly, as they expect HTML pages to be with Content-Type: text/html.

kachayev commented 5 years ago

Okay, so there 2 approaches here:

I'm okay with any of those. Do you want to provide a fix?

vgamula commented 5 years ago

@kachayev, sure, I'm going to replace it with org.apache.tika (tika-core only) as it seems to be the best tool for detecting MIME types. There is a wrapper on Clojure but I don't think we really need it.

kachayev commented 5 years ago

Thanks! I also had to update log deps, looks like log4j was used transitively before. Will do a new release shortly.