Closed OctaveLarose closed 2 years ago
Hello Octave Larose, thanks for contributing a PR to our project!
We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address octave -(dot)- larose -(at)- hotmail -(dot)- fr. You can sign it at that link.
If you think you've already signed it, please comment below and we'll check.
- Unsure about a
while(true)
loop in the generic implementations for both. While I'm assuming aStopIteration
exception will always be called and caught no matter what, I can't say it with full confidence out of lack of knowledge regarding GraalPython. Also unsure whether this implementation can preventKeyboardInterrupt
.
while(true)
is fine. You call e.expectStopIteration
, which does basically this (pseudocode):
if (e.type != StopIteration)
throw e;
So any other exception than StopIteration
will terminate the loop implicitly.
- No string specialization, as I assumed they weren't warranted as an
all
orany
call on astr
will always returnTrue
as far as I know anyway (unlessany
is fed an empty string). I could write a specialization method for strings that simply returnstrue
, however. (and does a simple check for empty strings withany
).
Agreed, the generic specialization will handle strings. We don't need to cover all builtin types, just the common ones where we expect some performance benefit from taking advantage of their internals.
- I wrote a
dict
specialization but don't know how warranted it is for the reasons mentioned above. I made theall
one returnTrue
by default because of them ; let me know if you can think of counterarguments to this implementation. (or this potentialPString
one)
It's not guaranteed, I added a counterexample in the review.
- No
bytes
/bytearray
specializations. I originally specialized usingPSequence
, which obviously caught them as they extend it, but I assumed specificity yielded better performance. Adding specifications for these types would be trivial.
I think it's ok as it is. Specializing on PSequence
would also work, but then you would have to use GetSequenceStorageNode
instead of calling the getSequenceStorage
method (see the review comment about lenght()
virtual call).
- Small note: I match on
PSet
and notPBaseSet
, hence not accounting forPFrozenSet
. Unsure how relevant that is and how it should be done.
You can match on PBaseSet
, there should be no downside in doing that.
- All my code is in
BuiltinFunctions.java
, but I'd rather have the bulk of it be in one/two separate class(es), as seems to be the case for many builtin functions. However, I wasn't sure what to name them, where to put them in the file tree, etc.
I'd personally keep them in BuiltinFunctions
to make them easy to find. They are not that big. Which "many builtin functions" do you see outside BuiltinFunctions
?
- I've been told specializing on different storage strategies may be pertinent, i.e subclasses of
SequenceStorage
instead of simply relying on this parent class (as I do here). If relevant, how should one go about this? This sounds like a lot of added logic if done right.
I mentioned it in the review too. You don't have to specialize on every subclass, just the ones that will be common and then have a generic fallback that handles the rest. I'd personally do that in a separate inner helper node to avoid combinatorial explosion of specializations.
You may also consider generalizing the all
and any
operations into a common superclass or helper node that would have a boolean field or boolean argument that determines if the operation is all
or any
.
Thanks for your feedback! I'll try to send an updated version next week or so.
To answer a specific few comments:
dict
iteration yields the keys. There is no guarantee that all/any of them would be always true.
I did know it yielded the keys (which makes me realize calling entries
was an obvious mistake I should have noticed), but actually had no idea they could be None
... or anything other than strings, hence my default true
. Frankly I don't know how I never came across that feature after years of using Python.
I'd personally keep them in
BuiltinFunctions
to make them easy to find. They are not that big. Which "many builtin functions" do you see outsideBuiltinFunctions
?
I was just thinking of calls to various nodes, like PyObjectSizeNode
, my wording was off. But really, my intent was to move at least part of the logic to a specialized node in the same way, which you've also recommended in a code comment, so that answers it.
You may also consider generalizing the
all
andany
operations into a common superclass or helper node that would have a boolean field or boolean argument that determines if the operation isall
orany
.
I did consider it, actually! I thought it might be better to keep them separate for code clarity reasons and maybe performance (it's hard for me to assess what can impact performance negatively and what can't, for now). But yes, if so I'll gladly make them share some logic. Not a fan of boolean arguments and might use an enum
instead for clarity, but that's hardly important.
Many questions I had but didn't necessarily ask got answered in your comments, so thank you for that! Thanks for taking the time to write these remarks, in general.
Here's the updated version of the code, taking your feedback into account. The implementation should now be correct now that the behavior of all/any on dict
have been fixed, but let me know if this isn't the case.
A few notes:
SequenceStorage
subtypes in a nested node named AllOrAnySequenceStorageNode
, as mentioned. I only wrote specializations for bool
and int
, although obviously adding more is trivial. PyObjectIsTrueNode
and LenNode
; I wanted to cache them to avoid having to pass them to the original specialization methods (e.g doInt
in AllNode
or AnyNode
), and thought it would be wise to rely on the @Child
annotation (a bit of a shot in the dark) for this. I'm not sure if this is advisable or if the annotation could be removed entirely (I'm also unsure about the @GenerateNodeFactory
I prepended to it, again, another shot in the dark hoping it'd be better for performance)AllOrAnySequenceStorageNode
to actually inherit from a parent node class. I chose PythonUnaryBuiltinNode
since I didn't know what to pick, but that one hardly seems appropriate for this usage given its name.AllOrAnySequenceStorageNode.checkSequenceStorage(...)
method which checks the type using instanceof
. It sounded like the natural solution to me, but I later considered that there may be more optimized/smart ways of doing this in this case that I'm simply not aware of.That's pretty much it, let me know if I didn't implement code related to your previous feedback correctly. Also looking for advice on improving code readability/quality!
Octave Larose has signed the Oracle Contributor Agreement (based on email address octave -(dot)- larose -(at)- hotmail -(dot)- fr) so can contribute to this repository.
@Ck1129 who are you?
@Ck1129 who are you?
I'm new here, just wanting to learn something new
I'm new here, just wanting to learn something new
Then please do it in a way that doesn't interfere with other people's work. Approving random PRs of repositories you don't contribute to just creates chaos for others.
No problem. That wasn’t my intention. Next time something like this happens try to be understanding and not come off with an attitude I clearly said I was new. I’m a grown man not a child choose ur words better On Fri, Oct 29, 2021 at 4:52 AM Michael Šimáček @.***> wrote:
I'm new here, just wanting to learn something new
Then please do it in a way that doesn't interfere with other people's work. Approving random PRs of repositories you don't contribute to just creates chaos for others.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/oracle/graalpython/pull/239#issuecomment-954567288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWCTAFVSFN2YHFIJUY55WMLUJJVE5ANCNFSM5F5TLHIA .
@OctaveLarose could you please rebase the PR on top of current master and make it build? It doesn't build anymore due to some unused import of a class we removed in the meantime
@OctaveLarose could you please rebase the PR on top of current master and make it build? It doesn't build anymore due to some unused import of a class we removed in the meantime
Sure thing, should be fixed now. I'd been basing it off an older version of master
because the most recent one never seemed to build for me ; even now, an mx build
yields a java.lang.NullPointerException
when building cext
on my machine.
As for the rest, I wasn't 100% done (hence the odd @Specialization(limit = "3")
for doHashStorage()
, for instance) and wanted to implement those last changes and proofread my work, but feel free to submit it for review internally in the meantime.
I was also trying to compare my changes with the current implementation via mx benchmark
, but it can't find any Java VMs to default to and no configs are available, so I can't get it to work with graalpython
. I'll probably try to fix that a bit more on my end then eventually give up and ask on your Slack!
even now, an
mx build
yields ajava.lang.NullPointerException
when buildingcext
on my machine.
If master cannot build, there are 2 common reasons why that could happen:
graal
repository checkout is out of sync with the revision we expect. A simple fix is mx sforceimports
. It's a command that checks out the revisions of dependencies specified in the suite file. It's generally a good idea to run mx sforceimports
everytime you pull master.mx clean
and building again.I wasn't 100% done
Ah, sorry, it seemed quite complete, so I thought it was. No hurry, take your time.
it can't find any Java VMs to default to
You need to build and run with the graal compiler enabled. By default, most mx
commands run without it. To enable it, you have to add --dy /compiler
after mx
. You first have to build it with mx --dy /compiler build
, then you can run mx --dy /compiler benchmark ...
.
ask on your Slack
Feel free to ping me there directly.
That should be it, I thought I had more left to do - my bad for not getting it done last Friday. I wanted to try to improve the @Specialization(limit = "3")
for doHashStorage()
which I found a bit arbitrary, but I can't and I'm assuming there's a reason for the compiler to enforce this limit
attribute being set for functions that rely on @CachedLibrary
; I could have set the limit to another value, but I preferred leaving it as is since I already don't understand it very well.
I also wanted to move AllOrAnyNode
to another file/directory since as far as I know, BuiltinFunctions.java
mostly only contains @Builtin
annotated classes, but I suppose it wouldn't be the only exception to that rule.
In other news, Graal was indeed not up to date and mx sforceimports
did the trick, I hadn't thought of that ; I had previously tried and failed with mx clean
, though. Same for the benchmarks, I can now run them with graalpython
; I aim to get some working later this week for me to get an idea of what the speedup was.
Regarding benchmarks, there are two major metrics we measure:
We care a lot about peak performance because that's how we differentiate from CPython, but interpreter performance is also important so that programs don't take minutes to start up. Ideally, you would benchmark both. The default configuration measures peak performance. For benchmarking interpreter performance, you can run a benchmark like this: mx --dy /compiler benchmark micro-small:name_of_your_benchmark -- --python-vm-config interpreter
.
AFAIK we don't have any benchmarks that would use all
or any
, so if you want to measure the improvements you did, you need to write your own benchmark (to run via mx
, you need to add an entry for it in mx.graalpython/mx_graalpython_bench_param.py
).
I ran the CI gates and there were some gate failures:
all
failed. You can execute the failed suite using mx punittest --no-leak-tests com.oracle.graal.python.test.builtin.BuiltinFunctionTests
. Since the builtins are now more complex, it might be a good idea to also add more tests to cover more specializations.mx python-style --fix --no-spotbugs
Fixed the tests, there was a big oversight in all
's behavior, sorry ; should be functional now (the tests do pass on my machine).
For the formatting, though, I'm not sure. mx python-style --fix --no-spotbugs
doesn't seem to have turned up anything on my end and I can't see what I changed to the python code that could have broken it ; I did remove the all
and any
python functions, but I did make sure to keep the number of line breaks between the previous and next functions just the same as before. I also suspected the newline at the end of the file, but that's PEP 8 stuff and I don't believe I added one that wasn't already here before. Are CRLF/LF issues a possibility? I'm on Ubuntu LTS 20.04, if relevant.
so if you want to measure the improvements you did, you need to write your own benchmark (to run via
mx
, you need to add an entry for it inmx.graalpython/mx_graalpython_bench_param.py
).
Thanks! This should save me some searching.
I rebased it on this repo's main branch again to make absolutely sure it wasn't the formatting issue didn't only happen when completely up to date, but this doesn't seem to have been the case.
Ah, sorry, I realized what the problem is with the formatter - it uses eclipse formatter and it needs to have eclipse installed, otherwise it only checks licenses. Please download eclipse (we use version 4.14 for the formatter, so it's probably a good idea to get the same), extract it somewhere and set environment variable ECLIPSE_EXE
to the path of the eclipse
binary. Then rerun the formatter.
Additionally, you can get the formatting in your IDE. If you're using Eclipse it should work already. If you're using Intellij IDEA you can install Eclipse Formatter Plugin and rerun mx intellijinit
, it should configure itself.
Ah, because of the name of the command, I was assuming it was purely for .py
files. Anyway, I've ran the formatter using Eclipse v. 4.14 and should have fixed the issue
You have a bug somewhere:
>>> any([False, True])
False
You have a bug somewhere:
>>> any([False, True]) False
Thanks for the heads up, corrected in fa07d30 - was an oversight on my part.
Hi all,
First of all, thank you for your work on this project! Second of all, while I've read the contributing guidelines and signed the OCA, I'm still waiting for it to be approved ; I thought I'd open this pull request to get some potential feedback in the meantime.
I've only been looking at this project for a couple of days and I'm not too familiar with it, just like I'm fairly new to Graal as well. Which is why I'm assuming this work should be revised.
While functional, it can be definitely improved in many ways I'm most likely not familiar with. As a series of notes:
while(true)
loop in the generic implementations for both. While I'm assuming aStopIteration
exception will always be called and caught no matter what, I can't say it with full confidence out of lack of knowledge regarding GraalPython. Also unsure whether this implementation can preventKeyboardInterrupt
.all
orany
call on astr
will always returnTrue
as far as I know anyway (unlessany
is fed an empty string). I could write a specialization method for strings that simply returnstrue
, however. (and does a simple check for empty strings withany
).dict
specialization but don't know how warranted it is for the reasons mentioned above. I made theall
one returnTrue
by default because of them ; let me know if you can think of counterarguments to this implementation. (or this potentialPString
one)bytes
/bytearray
specializations. I originally specialized usingPSequence
, which obviously caught them as they extend it, but I assumed specificity yielded better performance. Adding specifications for these types would be trivial.PSet
and notPBaseSet
, hence not accounting forPFrozenSet
. Unsure how relevant that is and how it should be done.BuiltinFunctions.java
, but I'd rather have the bulk of it be in one/two separate class(es), as seems to be the case for many builtin functions. However, I wasn't sure what to name them, where to put them in the file tree, etc. Which plays into my next question:SequenceStorage
instead of simply relying on this parent class (as I do here). If relevant, how should one go about this? This sounds like a lot of added logic if done right.Thanks for your time and hard work!