psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.17k stars 9.33k forks source link

Session.get_adapter won't work correctly for case-sensitive URLs #2585

Open agilevic opened 9 years ago

agilevic commented 9 years ago

In its current implementation Session.get_adapter turns url to lowercase in line 637. If it is aiming for case-insensitive compare it should also turn prefix variable to lowercase in the same line or do that at registration in the mount() method. But this whole idea seems incorrect because URL paths can be case sensitive and one might register different adapters for different paths on the same site.

The fix should most likely be spliting the URL into components and using that tuple as a key in the self.adapters dictionary. Changes mount() and get_adapter().

In [48]: urlparse.urlparse('HTTP://lala.mi.do/path')
Out[48]: ParseResult(scheme='http', netloc='lala.mi.do', path='/path', params='', query='', fragment='')

In [49]: d = {}

In [50]: purl = urlparse.urlparse('HTTP://lala.mi.do/path')

In [51]: d[purl] = 'myadapter'

In [52]: purl = urlparse.urlparse('HTTP://lala.mi.do/Path2')

In [53]: purl
Out[53]: ParseResult(scheme='http', netloc='lala.mi.do', path='/Path2', params='', query='', fragment='')

In [54]: d[purl] = 'myadapter2'

In [55]: d
Out[55]:
{ParseResult(scheme='http', netloc='lala.mi.do', path='/Path2', params='', query='', fragment=''): 'myadapter2',
 ParseResult(scheme='http', netloc='lala.mi.do', path='/path', params='', query='', fragment=''): 'myadapter'}
Lukasa commented 9 years ago

Agreed that the case-insensitive compare is a bad idea, but I also think splitting the URL into components is not a good idea either. The correct thing to do is to remove the case-insensitive compare in 3.0.0.

sigmavirus24 commented 9 years ago

If I remember correctly, we compare them case-insensitively because most people don't use adapters with paths (which isn't to say that your use-case is invalid or that this bug is invalid).

At least 90% of people mounting custom adapters will be doing it based on scheme alone (e.g., http://) while at least another 5% will probably be mounting based on scheme + host (e.g., http://lala.mi.do). For those use-cases the case-insensitive comparison is 100% valid. Scheme and non-user-info authority comparisons should be case-insensitive.

It's probably over-kill, but I think we should normalize the input to mount and get_adapter. That way we can appropriately make a comparison. That of course, does not preclude anyone from messing with the underlying attribute that stores the adapters and their mount points directly and thereby breaking these assumptions/safe-guards.

agilevic commented 9 years ago

Thanks for looking into this. The case here is using different caching strategies for different paths and is valid perfectly sensible.

Tuple or string as key is an implementation detail, but doing a plain case-sensitive compare is not entirely correct either. See my example below where one spelling uses HTTP:// and another is http:// for a scheme. Parser correctly normalizes URL fragments and its result can be used as key. If you want to use strings call geturl() on the parsed object.


In [3]: urlparse.urlparse('http://lala.mi.do/Path2')
Out[3]: ParseResult(scheme='http', netloc='lala.mi.do', path='/Path2', params='', query='', fragment='')

In [4]: urlparse.urlparse('HTTP://lala.mi.do/Path2')
Out[4]: ParseResult(scheme='http', netloc='lala.mi.do', path='/Path2', params='', query='', fragment='')

In [5]: urlparse.urlparse('HTTP://lala.mi.do/path2')
Out[5]: ParseResult(scheme='http', netloc='lala.mi.do', path='/path2', params='', query='', fragment='')

In [6]: urlparse.urlparse('ftp://user:pass@lala.mi.do/path2')
Out[6]: ParseResult(scheme='ftp', netloc='user:pass@lala.mi.do', path='/path2', params='', query='', fragment='')

In [7]: urlparse.urlparse('ftp://User:pass@lala.mi.do/path2')
Out[7]: ParseResult(scheme='ftp', netloc='User:pass@lala.mi.do', path='/path2', params='', query='', fragment='')

# performance-wise any implementation will work
In [14]: %timeit hash(urlparse.urlparse('ftp://User:pass@lala.mi.do/path2').geturl())
100000 loops, best of 3: 3.73 µs per loop

In [15]: %timeit hash(urlparse.urlparse('ftp://User:pass@lala.mi.do/path2'))
100000 loops, best of 3: 2.18 µs per loop
Lukasa commented 9 years ago

Agreed, we should be normalising those inputs. Regardless, this is an API change and so can't land until 3.0.0.

Lukasa commented 9 years ago

@sigmavirus24 I'm beginning to wonder if we should have a 3.0.0 feature branch, and should start landing these things we keep deferring to 3.0.0.

sigmavirus24 commented 9 years ago

@Lukasa I've been thinking of starting just such a branch in my fork. I wasn't sure if we would want to keep it here or elsewhere. While having people test it would be nice, having bug reports here about it might become confusing to some.