modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
21.91k stars 2.53k forks source link

[stdlib] Implement `__contains__` for `Tuple`, `List`, `ListLiteral` (almost) #2658

Open laszlokindrat opened 1 month ago

laszlokindrat commented 1 month ago

Now that we have ComparableCollectionElement, we can try to implement __contains__ for some common collection types using a workaround similar to what was employed in https://github.com/modularml/mojo/pull/2190. It's possible that the compiler would be able to correctly pick up a __contains__ overload with @staticmethod, e.g. something like this might just work:

struct Bucket[T: CollectionElement]:
    @staticmethod
    fn __contains__[S: ComparableCollectionElement](self: Bucket[S], elem: s) -> Bool: ...

var b: Bool = MyStuff() in Bucket[MyStuff]()

Even if in doesn't work with this, the implementation shouldn't have to change much once we roll out conditional conformance.

ChristopherLR commented 1 month ago

Tried something like this:

struct Bucket[T: CollectionElement]:
    var elements: List[T]

    fn __init__(inout self, elements: List[T]):
        self.elements = elements

    @staticmethod
    fn __contains__[C: ComparableCollectionElement](self: Bucket[C], borrowed elem: C) -> Bool:
        for i in range(self.elements.size):
            if elem.__eq__(self.elements[i]):
                return True
        return False

However, i'm getting:

error: special method may not be a static method
    fn __contains__[C: ComparableCollectionElement](self: Bucket[C], borrowed elem: C) -> Bool:
       ^
mojo: error: failed to parse the provided Mojo source module

If I remove @staticmethod:

error: 'self' argument must have type 'Bucket[T]', but actually has type 'Bucket[C]'
    fn __contains__[C: ComparableCollectionElement](self: Bucket[C], borrowed elem: C) -> Bool:
                                                    ^     ~~~~~~~~~
mojo: error: failed to parse the provided Mojo source module
ChristopherLR commented 1 month ago

Is there something similar to c++ enable_if that i'm missing? https://en.cppreference.com/w/cpp/types/enable_if

rd4com commented 4 weeks ago

@laszlokindrat , @ChristopherLR , if self can be a reference? this parametrization could do the job:

struct Bucket[T: CollectionElement]:
    var elements: List[T]

    fn __init__(inout self):
        self.elements = List[T]()

    fn __contains__[C: ComparableCollectionElement](
        self: Reference[Bucket[C]], elem: C
    ) -> Bool:
        for i in range(self[].elements.size):
            if elem.__eq__(self[].elements[i]):
                return True
        return False

fn main():
    var x = Bucket[Int]()
    x.elements.append(1)
    if x.__contains__(1):
         print("ok")
    #if 1 in x: #not yet
ChristopherLR commented 4 weeks ago

Or if we don't use a special method:

struct Bucket[T: CollectionElement]:
    var elements: List[T]

    fn __init__(inout self, elements: List[T]):
        self.elements = elements

    @staticmethod
    fn contains[C: ComparableCollectionElement](self: Bucket[C], value: C) -> Bool:
        for i in range(self.elements.size):
            if value.__eq__(self.elements[i]):
                return True
        return False

fn main():
    var b = Bucket[Int](List(1, 2, 3))
    if __type_of(b).contains(b, 2):
        print("2 is in the bucket")
    else:
        print("2 is not in the bucket")
laszlokindrat commented 4 weeks ago

@laszlokindrat , @ChristopherLR , if self can be a reference? this parametrization could do the job:

struct Bucket[T: CollectionElement]:
    var elements: List[T]

    fn __init__(inout self):
        self.elements = List[T]()

    fn __contains__[C: ComparableCollectionElement](
        self: Reference[Bucket[C]], elem: C
    ) -> Bool:
        for i in range(self[].elements.size):
            if elem.__eq__(self[].elements[i]):
                return True
        return False

fn main():
    var x = Bucket[Int]()
    x.elements.append(1)
    if x.__contains__(1):
         print("ok")
    #if 1 in x: #not yet

@rd4com Very cool, and I am actually surprised that parameter inference fails for the 1 in x sugar. I will look into this.

laszlokindrat commented 4 weeks ago

@rd4com I used your trick on List.index and it worked like a charm: https://github.com/modularml/mojo/commit/ca10f356ec6f7efff110de36632a8630d6ac6935. This basically means that we have conditional conformance in these methods, as long as you are willing to deal with the explicit dereferencing (i.e. self[]) in the method body, which I think is totally okay. This is pretty amazing.

Would you be interested in going ahead and implementing __contains__ for the structs above (and maybe others)? Please ping me on the PRs, if so.

Regarding the 1 in x not desugaring correctly, this seems like a parser bug. Not sure when the fix will be ready, but explicit calls to __contains__ are acceptable for the time being.

rd4com commented 4 weeks ago

@laszlokindrat Nice, List.__contains__ is there: https://github.com/modularml/mojo/pull/2667 ,

1 in x works on that one (rebind, _type_is_eq and constained) :partying_face:

I'll try to implement Tuple.__contains__, just don't know which way is prefered ?

Of course, you're welcome to make theses PR's @ChristopherLR aswel (and any other person too)!

ChristopherLR commented 3 weeks ago

Hey @rd4com and @laszlokindrat, not quite Tuple or ListLiteral, I've picked InlineList just as a starting point for me to get used to the codebase :) If available would you mind checking out https://github.com/modularml/mojo/pull/2703

soraros commented 3 weeks ago

Should we even provide __contains__ for heterogeneous collections like Tuple?

ChristopherLR commented 3 weeks ago

@rd4com I discovered that this test fails for List when testing InlineList:

def test_list_contains_str():
    var x = List[String]("a", "b", "c")
    assert_false("d" in x)
    assert_true(x.__contains__("a"))
    assert_false(x.__contains__("e"))
 constraint failed: value type is not self.T

However this passes:

def test_list_contains_str():
    var x = List[String]("a", "b", "c")
    assert_true(x.__contains__("a"))
    assert_false(x.__contains__("e"))

Note that i've delete assert_false("d" in x)

laszlokindrat commented 3 weeks ago

@rd4com I discovered that this test fails for List when testing InlineList:

def test_list_contains_str():
    var x = List[String]("a", "b", "c")
    assert_false("d" in x)
    assert_true(x.__contains__("a"))
    assert_false(x.__contains__("e"))
constraint failed: value type is not self.T

However this passes:

def test_list_contains_str():
   var x = List[String]("a", "b", "c")
   assert_true(x.__contains__("a"))
   assert_false(x.__contains__("e"))

Note that i've delete assert_false("d" in x)

Yeah, this is the parser bug I mentioned before. Don't worry about it, direct call to __contains__ is fine for now. Thanks for the patch, I will take a look!

laszlokindrat commented 3 weeks ago

Should we even provide __contains__ for heterogeneous collections like Tuple?

It would be nice if we could do if "foo" in ("foo", "bar", 1): ....

soraros commented 3 weeks ago

@laszlokindrat I agree that it's rather convenient. But shouldn't we write ["foo", "bar", ...] instead? From the typing point of view, I find it difficult to justify having a __contains__ for a struct (Tuple is a case of this) which iterate over all the compatible fields to test for equity. That feels highly ad-hoc, and feels like a meta programming feature.

laszlokindrat commented 3 weeks ago

@laszlokindrat I agree that it's rather convenient. But shouldn't we write ["foo", "bar", ...] instead? From the typing point of view, I find it difficult to justify having a __contains__ for a struct (Tuple is a case of this) which iterate over all the compatible fields to test for equity. That feels highly ad-hoc, and feels like a meta programming feature.

Can you give an example on how this could be misused/abused? I feel pretty strongly that, from a "python superset" point of view, "foo" in ("foo", "bar", 1) has to work, at least for the simple cases.

soraros commented 3 weeks ago

The concern here isn't misuse, but rather whether we need to mimic Python in this particular case. Python's tuple serves multiple functions, effectively encapsulating three distinct types:

  1. Tuple: heterogeneous and of fixed length.
  2. InlineArray: homogeneous and of fixed length.
  3. List: homogeneous and of variable length.

In a typed language like Mojo, it's logical for the latter two types to implement this interface. Moreover, a Python tuple can be mapped to either InlineArray[PyObject] or List[PyObject]. Given that the default implementation for __contains__ is any(val == x for x in self), it seems unnecessary to introduce this ad-hoc method for non-sequence types. Similarly, we might want to reconsider the utility of using for v in (a, b, c): in Mojo.

laszlokindrat commented 3 weeks ago

Syntactically, for v in (a, b, c): et al have to be valid. Considering how basic this feature is, and how quickly someone coming from python would try to do this (anecdotally, this was actually the first line of mojo I ever tried to write), we just cannot not have it.

Moreover, a Python tuple can be mapped to either InlineArray[PyObject] or List[PyObject].

Now this is a good point; maybe the problem is that (1, 2, 3) maps to Tuple while it should map to InlineArray? At the same time, Python's Tuple type is also heterogeneous, so our hands are a bit tied (admittedly, though, type checkers probably give up when they see someone iterate through a tuple). So my philosophy is this: if we can't see obvious ways to misuse/abuse, we should push for syntactic compatibility. If we discover that methods like this do pose unforeseen problems, we can 1) document sharp edges, 2) improve/fix them, 3) deprecate and then remove them.

soraros commented 3 weeks ago

Syntactically, for v in (a, b, c): et al have to be valid.

Totally, if we can enforce the same type requirement at compile time.

Now this is a good point; maybe the problem is that (1, 2, 3) maps to Tuple while it should map to InlineArray?

Again, agreed.

Question: should we allow True in (1, 2, "a")? Or equivalently, do we require T in Ts for where val: T and t: Tuple[*Ts]. It's tricky because it makes sense to not enforce it for heterogeneous tuple literal, but not so much for homogeneous ones.

laszlokindrat commented 3 weeks ago

Question: should we allow True in (1, 2, "a")?

I don't see why not. A perfect implementation of __contains__ might not be possible today (we might need parametric traits or something to correctly model comparisons between arbitrary types), but I don't think that should stop us from having something.