vlang / v

Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
MIT License
35.7k stars 2.16k forks source link

vweb: Upper case letters of paths get mapped lower cased. #17960

Closed francescortiz closed 1 year ago

francescortiz commented 1 year ago

Describe the bug

HTTP 404 not found when mapping uppercase paths using vweb method attributes. If you access to the lower case counterpart it works.

Expected Behavior

case should be respected for paths.

Current Behavior

For upper case paths, the lower case equivalent gets mapped.

Reproduction Steps

  1. Create a web server.
  2. Add a path with upper case letters.
  3. Try to access the path with lower case letters.

Possible Solution

No response

Additional Information/Context

No response

V version

V 0.3.3 d1f57ea

Environment details (OS name and version, etc.)

Manjaro Linux

jacksonmowry commented 1 year ago

I would like to work on this. According to the path parsing code in vweb/parse.v paths are converted to lower case for case-insensitive comparisons. So when getting the path from a request this should also be converted to lower case. Is this the desired behavior @medvednikov ? If so we would need to convert the path from the request to lower case at some point before it is handled.

jacksonmowry commented 1 year ago
module main

import vweb

struct App {
    vweb.Context
}

fn main() {
    mut app := &App{}
    vweb.run(app, 8080)
}

['/Index']
pub fn (mut app App) index() vweb.Result {
    return app.text('hi\n')
}
jackson@jtop:~/vcode/uc_server_test$ v run .
[Vweb] Running app on http://localhost:8080/

Request works with lowercase...

jackson@jtop:~$ curl localhost:8080/index
hi

Request fails with uppercase...

jackson@jtop:~$ curl localhost:8080/Index
404 Not Found

Minimum fix, but I am unsure that this is the correct approach

    // URL Parse

    // url := urllib.parse(req.url) or { // Chaing req.url => req.url.to_lower() fixes the case sensitivity issue
    url := urllib.parse(req.url.to_lower()) or {
        eprintln('error parsing path: ${err}')
        return
    }
Casper64 commented 1 year ago

I agree, either convert everything to lowercase and display an error/warning when a route is given a path with a capital letter (should probably also check for invalid characters). Or allow capitalized routes. Now there are two ways to do something.

Casper64 commented 1 year ago

The thing is that search engines treat urls very differently if you have a capital letter in them. So the route "/Index" does not automatically refer to the index page in the eyes of a search engine. Capital letters in urls could cause bad effects for SEO unintentionally.

Plus there are some browsers/applications/tools/bots that automatically convert urls into lowercase when they visit them. Thus resulting in them not finding your content.

francescortiz commented 1 year ago

In my case I am making a mock server for a service that has capitals in the URL. Why don't go for the simpler approach of leaving the case untouched and do case sensitive comparisons? Less code, less checks, and full support for whatever exists in the wild or the developer wants to implement.

JalonSolov commented 1 year ago

According to the HTML spec, URLs should be treated as case sensitive after the machine name.

The problem with mixing upper & lowercase is that you can wind up with valid URLs that look like duplicates, or you can actually have duplicate pages if you're really not paying attention. http://foo.com/A is a completely different URL from http://foo.com/a.

Search engines will also treat these as separate URLs. Google recommends only using lowercase, and having a rewrite rule such that if there are any uppercase characters after the machine name, they get converted to all lowercase, with a 301 return code (redirect) for the client.

That said, V is doing this incorrectly by forcing the URLs to be treated as lowercase without the 301 return code.

francescortiz commented 1 year ago

Google recommends only using lowercase, and having a rewrite rule such that if there are any uppercase characters after the machine name, they get converted to all lowercase, with a 301 return code (redirect) for the client.

This is a SEO recommendation, not a standard. IMHO it is not the domain of a simple HTTP server to worry about it but of a higher level SEO framework or middleware.