nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.65k stars 1.47k forks source link

User agent parsing is not valid, it splits on ",". #11757

Open treeform opened 5 years ago

treeform commented 5 years ago

req.headers["user-agent"] returns:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML

But I expected it to return:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36

It splits on "," which it should not.

req.headers.table["user-agent"] returns:

@["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML", "like Gecko) Chrome/75.0.3770.100 Safari/537.36"]

I found this error while updating my user agent parsing library: https://github.com/treeform/useragents

Work around:

req.headers.table["user-agent"].join(", ")

Example code:

# "User-Agent" parsing is invalid it splits on ,

import asynchttpserver, asyncdispatch, tables, strutils

var server = newAsyncHttpServer()
proc cb(req: Request) {.async, gcsafe.} =

  echo req.headers
  echo req.headers["user-agent"]
  echo req.headers.table["user-agent"].join(", ")

const port = 8733
const address = "127.0.0.1"
waitFor server.serve(Port(port), cb, address = address)
ghost commented 4 years ago

Well, seems like parseHeader from httpcore calls parseList which does split the line by , (because that seems to be the right way for most headers), maybe we can introduce a special case by checking for user-agent in asynchttpserver request parsing code and add a special noSplit bool argument (of course it would be false by default) to parseHeader in httpcore?

ghost commented 4 years ago

Or otherwise we can just do join ourselves after parsing header line by parseHeader (although IMO that's a bit more hacky)

supakeen commented 4 years ago

The above PR provides the mechanics for turning the splitting off in asynchttpserver later :)

treeform commented 3 years ago

I have written a HTTP library to solve this problem: https://github.com/treeform/puppy

artemklevtsov commented 2 years ago

date header also spitted:

❯ inim
👑 INim 0.6.1
Nim Compiler Version 1.6.4 [Linux: amd64] at /home/unikum/.nimble/bin/nim
nim> import httpcore
nim> echo parseHeader("date: Mon, 04 Apr 2022 10:08:54 GMT")
(key: "date", value: @["Mon", "04 Apr 2022 10:08:54 GMT"])
treeform commented 2 years ago

I prefer if the default behavior was not to split, and let user split on "," in any header they desire themselves.

Less magic is simpler.