Open JelleZijlstra opened 1 month ago
cc @rhettinger for ABCs and @AlexWaygood for prior discussion on typeshed.
I think it would be nice to have such addition. If we were to add them in 3.14, I'd like to help! (at least with one method; I think it's better if we don't add them all in one go).
Go ahead, I haven't started coding yet.
IIRC, the sequence protocol on memoryview
works in the C API. You guys should be able to just use those functions to implement the methods.
By the way:
>>> reversed(memoryview(b'a'))
<reversed object at 0x7f6ece3fd8b0>
works and we actually have tests for that. However, I think there is somewhere an implicit cast or a generic reversed that is being used via indices + length maybe? So I don't know whether we should have a dedicated reversed method or just dispatch to the generic implementation.
Yes, reversed()
works through len and getitem lookup. It may be worth adding that method if it makes reversing more efficient, but I'd also be OK with skipping that one and using some hacks in the stub; after all, it's unlikely people are calling __reversed__
directly.
__contains__
is in a similar situation; in
already works on memoryviews. However, there it's likely that a contains implementation written directly for memoryview is significantly more efficient.
Well... the __contains__
I used is just by iterating over the object for now. I actually haven't checked that __contains__
worked beforehand. I can definitely make it more efficient though if performances matter (it's just that the __contains__
does not exist per se).
I took a quick glance at the PRs and they seem fine to me, but I would be very cautious about making sure that e.g. memoryview.index
does the same thing as PySequence_Index
, otherwise you're bound to break untold amounts of code relying on the buffer protocol.
That's a good point (though I'm not sure there are untold amounts of code using PySequence_Index
on memoryviews). PySequence_Index
delegates to _PySequence_IterSearch
in abstract.c, which works similar to the implementation in Bénédikt's PRs.
... I forgot to finish the multi-dimensional case. And the fact that it's not only basic memoryviews that should be supported but the entire protocol... I'll need to check the multi-dimensional case carefully.
you're bound to break untold amounts of code relying on the buffer protocol.
(I'll try passing on that achievement)
It appears iteration on multidimensional memoryviews currently isn't implemented at all, so we don't need to support it for these new methods either.
I've implemented the missing (naive) interface. The performances should be better than just the generic implementations since we have less checks but some methods can be improved by directly accessing the underlying buffer instead of creating iterators.
If such performance gain is needed, we'll address those concerns in a separate issue.
I can look closer at the PRs sometime today. I have a few general comments at a quick glance:
Performances should really be addressed separately. We try to avoid using the private API if this is not performance critical or if it is exposed "almost directly" to the user. Before using the private API, we need benchmarks but I think it's first better not to do it in those PRs. We will definitely gain performances by directly accessing the buffers but those should be investigated.
For assertions, which place do you have in in mind? I can find some in the __reversed__
implementation since I'm making assumption on the inputs, so I can update it later but otherwise I can't find other places where assertions need to be required.
When writing C code, I like to put assertions everywhere and anywhere. They add no overhead at runtime (they get compiled out on release builds) and make debugging a lot easier. Generally, if you make an assumption, add an assertion.
The performances should be better
That's a dangerous statement; it would be good to run some microbenchmarks on operations like x in memoryview()
, reversed(memoryview())
.
I am going to wait a bit before merging any of the PRs here in case other core devs want to voice their opinion on whether this is a good idea.
I'm conflicted about this one.
On the plus side, this makes the APIs more consistent and will simplify the typeshed stubs.
On the minus side, it feels a bit off to implement (and have to test and maintain) an actual concrete implementation for functionality that no user has ever needed. It seems a little like the tail wagging the dog. In the past, we've usually applied a principle of parsimony and resisted API expansions until good use cases have arisen. That has helped avoid cruft accumulating for features than never get used.
On the plus side, we could also benefit from possible improvements though that's just a gut feeling.
For instance materializing the reverse iterator is now slower but iterating over it is faster if the buffer is large enough.
This could also serve for the multidimensional case but I have no idea if this is a feature that we plan to have or not. The APIs were simple enough to implement so it didn't bother me btw.
I feel that we should add count
and index
. memoryview
is registered as a Sequence, and Sequence is documented as having those methods, so to me it seems like a bug that the methods don't exist on memoryview
.
I don't feel as strongly about the missing dunders, __contains__
and __reversed__
. Adding them makes the typing a bit more tidy, but we could manage without them. Adding those methods may produce performance wins, but there probably isn't a lot of real code that relies on reversing memoryviews or doing containment checks on them.
Yeah index and count seem important since they are public and documented. For contains and reversed I think we can just rely on the default dunders.
I'm actually wondering but does "being a sequence" means "having a __contains__
method" or does it mean "x in seq" works? or is it not detailed on purpose?
Bug report
Bug description:
The
memoryview
builtin is registered as aSequence
, but doesn't implement the full API, because it doesn't inherit fromSequence
at runtime and doesn't implement the mixin methods.Among the methods listed for Sequence in the documentation, memoryview is missing
__contains__
,__reversed__
,index
, andcount
. This is causing problems for typing; in the typeshed stubs we have to either lie one way by claiming that memoryview has methods it doesn't have, or lie another way by claiming it is not a Sequence whenissubclass()
says otherwise at runtime (python/typeshed#12800). To fix this, we should either make memoryview not a Sequence, or add the missing methods. The former has compatibility implications, so I propose to add the methods in 3.14.CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs