gruns / furl

🌐 URL parsing and manipulation made easy.
Other
2.64k stars 152 forks source link

Missing __hash__ method with Python 3.x #59

Closed bodgit closed 8 years ago

bodgit commented 8 years ago

Hi,

I'm using your module directly and indirectly via the SQLAlchemy-Utils module and its URLType column type which coerces a URL into a furl object.

I've found an issue when I try and use the URLType type on Python 3.x, I get the following error when I try and query an object using such a column:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/query.py", line 2588, in all
    return list(self)
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 86, in instances
    util.raise_from_cause(err)
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 189, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 183, in reraise
    raise value
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 71, in instances
    rows = [proc(row) for row in fetch]
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 71, in <listcomp>
    rows = [proc(row) for row in fetch]
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/loading.py", line 379, in _instance
    instance = session_identity_map.get(identitykey)
  File ".../venv-3.5/lib/python3.5/site-packages/sqlalchemy/orm/identity.py", line 146, in get
    if key not in self._dict:
TypeError: unhashable type: 'furl'

I've managed to fix this by monkeypatching the furl class with a __hash__ method like so:

from sqlalchemy_utils import URLType
from furl import furl

def furl_hash(self):
    return hash(self.url)

furl.__hash__ = furl_hash

class Foo(Model):
    url = Column(URLType)
    ...

Would it be possible to add such a method? I'm not sure if I'm 100% correct just using the value of self.url here but it looked close enough.

gruns commented 8 years ago

Odd. This behavior differs between Python 2

>>> sys.version_info[:2]
(2, 7)
>>> from furl import furl
>>> hash(furl())
1520225

and Python 3.

>>> sys.version_info[:2]
(3, 5)
>>> from furl import furl
>>> hash(furl())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'furl'

I'll look into this.

gruns commented 8 years ago

On reflection, furl objects should not be used as dictionary keys. They're mutable.

From Python 3's datamodel documentation

If a class defines mutable objects and implements an __eq__() method, it should
not implement __hash__(), since the implementation of hashable collections
requires that a key’s hash value is immutable (if the object’s hash value
changes, it will be in the wrong hash bucket).

Example of potential confusion due to mutability.

>>> f = furl('http://google.com/')
>>> d = {f: f.host}
>>> f.url, d[f] 
('http://google.com/', 'google.com')
>>> f.set(host='yahoo.com')
>>> f.url, d[f]
('http://yahoo.com/', 'google.com')
gruns commented 8 years ago

@bodgit Does that answer your question?

bodgit commented 8 years ago

Your explanation and example makes complete sense, however I guess my problem still remains that the column type doesn't work without my monkeypatch.

I'm not sure why there's even a requirement to use the furl object as a hash key, I think it comes from SQLAlchemy direct rather than the code implementing this URLType column type, specifically this line https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/identity.py#L149 triggers the exception.

The other option I've seen used elsewhere (possibly incorrectly) is to use the return value of id() which would mean it's independent of the value of the object but then I guess two furl objects containing the same URL would have different id() values when you would expect them to hash to the same value.

I guess it's a balancing between what's correct and what just makes the error go away :wink:

gruns commented 8 years ago

Is it your code or SQLAlchemy-Utils' code that uses furl objects as dictionary keys?

bodgit commented 8 years ago

Neither, I think it's from the base SQLAlchemy library itself, see the line I linked above that shows where it first tries to hash the value.

gruns commented 8 years ago

What I mean is, whose code stores furl objects as dictionary keys? That is, in whose code is SQLAlchemy.orm.identity.WeakInstanceDict.__setitem__() (or similar) called with a furl object?

If furl objects are stored as dictionary keys in your code, a simple solution is to wrap the dictionary with a thin shim that serializes and deserializes furl objects to their URL strings on insertion and back on retrieval.

If furl objects are stored as dictionary keys in SQLAlchemy-Utils' code, I'll contact them and let them know furl objects are mutable and shouldn't be dictionary keys.

bodgit commented 8 years ago

I think it's SQLAlchemy itself; neither my code nor what I can see in the SQLAlchemy-Utils does anything with dictionaries.

I think it must be doing it for some sort of de-duplication or maybe testing for modification.

I did inquire on their IRC channel but didn't really get anything useful back, I suspect their mailling list is the next step.

bodgit commented 8 years ago

I've since found I only get this error when I try to use the URLType as (part of) a primary key, which is what I was trying to do initially.

However whilst this worked with SQLite as a storage backend, MySQL for example complains that the underlying column type (TEXT) cannot be indexed without specifying how many characters to use so I've reworked my code so that I don't use the URL directly as a primary key and then the error disappeared and no monkeypatching is required.

gruns commented 8 years ago

Great to hear you found a solution. Closing this issue.

Thank you for bringing this to my attention, Matt. Don't hesitate to let me know if there's anything else I can help with.