lithops-cloud / lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
http://lithops.cloud
Apache License 2.0
317 stars 105 forks source link

Fix serialization bug which triggers side effects on dynamic attributes #1302

Closed rabernat closed 6 months ago

rabernat commented 6 months ago

Thanks for this great open source library! 🙏 I'm deeply appreciative of your work for the community.

In using lithops, I discovered a bug in the serialization routines which triggers side effects on dynamic attributes (e.g. @property methods). For me this, is a dealbreaker for using lithops with Arraylake, since our objects have such properties which trigger expensive API calls. For example, in a map job over 1000 objects, this method gets called 1000 times before the function is even submitted.

I chased this down to the use of inspect.getmembers in lithops.job.serialize.

https://github.com/lithops-cloud/lithops/blob/cdf558448608d48a0eb0ad1fe02113495b25173f/lithops/job/serialize.py#L135

Python 3.11 implements inspect.getmembers_static, which is designed to address this exact situation. (See also discussion in https://github.com/python/cpython/pull/20911.)

To demonstrate this, this PR includes a test for this situation and implements a fix using getmembers_static. This passes in a local environment with python 3.11. However, it will fail in CI, which uses python 3.10.

I would recommend vendoring a version of this function until you drop python 3.10 support.


Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.
JosepSampe commented 6 months ago

Nice catch! In this PR (https://github.com/lithops-cloud/lithops/pull/1303), I've included the relevant code from the inspect.getmembers_static() method in the Lithops library, so that it can be available for any python version. Would this suit your use case?

rabernat commented 6 months ago

Would this suit your use case?

Yes I think so.


This is the second time I've started a PR which was then superseded by another PR from an existing maintainer. This makes me feel like PRs from outside contributors are not welcome. Are you guys interested in new contributors or not? The fact that you have a contributing guide gave me the impression that you were.

JosepSampe commented 6 months ago

Yes, outside PRs are welcome. Thought you drafted PRs just to point out where was the error, since for example in https://github.com/lithops-cloud/lithops/pull/1296 it required much more modifications not directly related to the issue. Sorry for the confusion

JosepSampe commented 6 months ago

@rabernat feel free to switch this PR to ready, and I will be glad to merge it. Thanks for finding this issue and for the contribution, and sorry for the confusion.

rabernat commented 6 months ago

Hi @JosepSampe - no worries and thanks for the reply! There is no point in replacing #1303, which is already pretty much complete. I really appreciate your quick responses and fixes for all my issues.

In most of the open-source projects I contribute to, it's typical to open a draft PR to discuss possible technical approaches to a certain issue and receive early feedback before investing lots of time in complete solution. That was my intention here.

Of course you are capable of implementing most new features and bug requests much faster than me or any other new contributor. However, in the long run, if your goal is to make a mutli-stakeholder project, it can be a worthwhile tradeoff to go a little bit more slowly in order to help new contributors spin up on the code base and understand the design of the software.

I am loving lithops and am really excited it about its potential. In the future, I'll try to be very clear about my intentions when opening a PR.

Keep up the good work! 💪