hypermail-project / hypermail

Hypermail is a free (GPL) program to convert email from Unix mbox format to html.
http://www.hypermail-project.org/
GNU General Public License v2.0
156 stars 73 forks source link

string.c:parseurl() doesn't understand unicode spaces #65

Closed jkbzh closed 3 years ago

jkbzh commented 4 years ago

parseurl() uses isblank() and sscanf to extract a URL that is followed by a space. Howerver, if that space is coded in UTF-8 (for example a non-breakable space 0xC2 0XA)) and takes more than one byte, the sscanf will fail to detect that and will concatenate the space to the URL.

The solution here would be to replace the sscanf by a regexp, taking advantage that hypermail already uses PCRE for other things elsewhere.

jkbzh commented 3 years ago

Fixed in aada619f0849eb7f4673974676d5dbe1058bc0b3