google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go
BSD 3-Clause "New" or "Revised" License
2.31k stars 209 forks source link

reject using unhashable types with 'in' #113

Open josharian opened 5 years ago

josharian commented 5 years ago
$ starlark -c "{} in {}"
$ python3 -c "{} in {}"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: unhashable type: 'dict'
$ python2 -c "{} in {}"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: unhashable type: 'dict'
$ starlark -c "print({} in {})"
False

This might not be reasonably fixable. This comment from eval.go:777 is apropos:

// Ignore error from Get as we cannot distinguish true
// errors (value cycle, type error) from "key not found".

We could detect unhashable types by calling Hash before calling Get, although that would be expensive.

We could have a designated Unhashable error type, which we could ask implementations of Hash to return.

Or we could call this unfortunate and give up. Alan, preferences?

alandonovan commented 5 years ago

I assume you're talking specifically x in dict; there's no problem with unhashable in list, obviously.

What if the concrete implementation of Mapping is a user-defined red/black tree? There's no problem with unhashable keys there either. Proposed additional checks must thus be restricted specifically to type *Dict, not any Mapping.

We could detect unhashable types by calling Hash before calling Get

Better still, after Get: the common case is no error; let errors be slow. If there's an error, try hashing. If hashing fails, report an error, otherwise let the error slide?

This would be a backward incompatible change to the Mapping.Get contract:

type Mapping interface {
 ...                                                                                                     
        // Get also defines the behavior of "v in mapping".                                                                  
        // The 'in' operator reports the 'found' component, ignoring errors.                                                 
        Get(Value) (v Value, found bool, err error)
}

Side grumble: Hash is most unsatisfactory. I wish it took a seed parameter, used type uintptr, and was an optional method. Too late to change now though.

josharian commented 5 years ago

I assume you're talking specifically x in dict; there's no problem with unhashable in list, obviously.

Right. And also x in set.

What if the concrete implementation of Mapping is a user-defined red/black tree?

The spec doesn't seem to indicate that this is supported:

The in operator reports whether its first operand is a member of its second operand, which must be a list, tuple, dict, set, or string.

I assume that this is an oversight. But the treatment of in for user-defined Mappings appears to be unspecified at the moment.

The ideal scenario, to my mind, is that if the Get fails because it calls Hash and Hash isn't supported, then you get an error. Not otherwise. A red/black would never call Hash, so this yields reasonable behavior. Your suggestion about calling Hash to diagnose a failed Get achieves this behavior, I believe.

Better still, after Get: the common case is no error; let errors be slow. If there's an error, try hashing. If hashing fails, report an error, otherwise let the error slide?

That sounds sensible to me.

This would be a backward incompatible change to the Mapping.Get contract:

Bummer. This decision is above my pay grade.

Side grumble: Hash is most unsatisfactory. I wish it took a seed parameter, used type uintptr, and was an optional method. Too late to change now though.

Indeed.

josharian commented 5 years ago

If we cared enough about this to want to fix it, and wanted to finesse the backwards-compatibility issue, we could take the route of having a starlark.Unhashable error type (probably just a struct wrapping an underlying error). If Get returns a starlark.Unhashable error, then we respect it. All other errors are ignored as they are right now. No existing implementations return such an error now, so this doesn't break any existing code, even though the docs have a semantic change. And then future Mapping implementations could opt in as they saw fit.

alandonovan commented 5 years ago

The Unhashable approach seems workable. I just did something very similar to add spellchecking to getAttr---see NoSuchAttrError. Care to send a CL? :)

josharian commented 5 years ago

Will do, using a "I'll work on it while my model is training" level SLA.