python / steering-council

Communications from the Steering Council
159 stars 23 forks source link

Priorities within CPython development for CZI proposal? #26

Closed brainwane closed 4 years ago

brainwane commented 4 years ago

TL;DR: what are a few priority areas for CPython development where one full-time person or a few part-time people, working for one year, could make a big difference? Let's decide by July 7th and apply for CZI money.

Chan Zuckerberg Initiative's Essential Open Source for Science grant is going to open their next funding cycle to applications soon -- June 16th to August 4th, per the request for applications.

The PSF can apply for $50k-$250k USD for a 1-year project. The PSF has successfully gotten one of these grants already: $200,000 to improve the user experience and debuggability of pip (more details), and we've been welcomed to apply again.

I think the PSF should consider proposing at least one project that improves CPython. In my experience, projects need at least a few weeks to put together a good application, sometimes a month. So I would love to use this issue to generate some ideas, and then, by July 7th, have a consensus, aided by the Steering Council, on what to pursue. Then the new Project Funding Working Group (currently being formed) can help with advising proposal-writers and helping them get Board and/or Ewa's approval, and help get the proposal submitted before August 4th.

I suggest a few criteria:

  1. CPython maintainers want it: there's already consensus among CPython devs, and if a PEP is involved it's been accepted.
  2. fairly well-scoped
  3. fundable: would happen much faster if the PSF got funding to implement the work. (So, it has to be legal and physically possible.)
  4. applicable to biomedical research: so, for example, performance and reproducibility work is probably more relevant here than work on security or improving Python's teachability in the classroom.

My three suggestions to kick off discussion are:

  1. make a chunk of progress on the GILectomy
  2. make a proper start on property-based testing for Python builtins and the standard library (per this year's Language Summit)
  3. hire a full-time core workflow manager and coordinator for one year

(Budget estimate: If we ask for and get $250K, and the PSF takes 15% overhead, we get $212,500. If we assume that PSF hires contractors for this work at an hourly pay rate at USD$115-$200 per hour, that's enough for about 1000 to 1800 hours of work. CZI funds one-year projects, so, at that pay rate, this ends up being the range between one half-time person and one full-time person.)

gvanrossum commented 4 years ago

I'm just a lurker now, but why biomedical research?

And some ideas for the brainstorm:

In Python 3.10 we'll be able to use the full power of the PEG parser. But 3rd party tooling like Black doesn't have a PEG parser yet. We could work on a 3rd party PEG parser suitable to replace lib2to3.

Improve mypy -- it may be losing much of its corporate funding, and we need to upgrade the type system to support numpy.

More directly CPython: make it work properly on mobile and ensure that it keeps working there. ( And in web browsers for that matter.)

Burn down the list of open PRs and issues, and design a new workflow to prevent them from getting so clogged again. (Is the bpo -> GitHub transition funded yet?)

Modernize the buildbot farm (yes, we need OS diversity, but many of the hosts are ancient and too slow or memory starved to be useful). Maybe some mobile 'bots, too.

Fund Serhiy for a year to make improvements across the board.

Accelerate work on HPy.

Incorporate several features from trio into asyncio (nurseries, multi-exceptions), and other asyncio work (better ways of connecting asyncio and threads?).

Improve the docs (maybe migrate to markdown???).

v-python commented 4 years ago

On 6/10/2020 10:39 PM, Guido van Rossum wrote:

More directly CPython: make it work properly on mobile and ensure that it keeps working there. ( And in web browsers for that matter.) +1. This is my favorite.

brainwane commented 4 years ago

The reason that relevance to biomedical research is important for this particular question, right now is because the August 4th deadline is to ask for money from Chan Zuckerberg Initiative's Essential Open Source Software for Science program which "invites applications for open source software projects that are essential to biomedical research". The previous requests for applications explain:

CZI currently supports several areas of basic science and technology with the goal of making it possible to cure, prevent, or manage all diseases by the end of this century. This program aims to support software tools that are essential to this mission. Applications for two broad categories of open source software projects will be considered in scope:

  • Domain-specific software for analyzing, visualizing, and otherwise working with the specific data types that arise in biomedical science (e.g., genomic sequences, microscopy images, molecular structures). Software will be considered out of scope if it primarily serves domains outside biomedical science (e.g., physics, astronomy, earth sciences). While we appreciate that other communities may want to explore new extensions of their software to the life sciences, such applications are unlikely to be selected.
  • Foundational tools and infrastructure that enable a wide variety of downstream software across several domains of science and computational research (e.g., numerical computation, data structures, workflows, reproducibility). While foundational tools will be considered in scope for this program, they must have demonstrated impact on some area(s) of biomedical research.

The more persuasively that our grant application can say "here is how this will help the biomedical researchers who use Python," the more likely we are to get money from this program.

I remember @freakboy3742's Language Summit presentation on Python on mobile this year -- Russell, if you can speak up with a paragraph about why this is particularly interesting from a biomedical research perspective, I'd love that!

ericholscher commented 4 years ago

I'd vote strongly for this:

Burn down the list of open PRs and issues, and design a new workflow to prevent them from getting so clogged again. (Is the bpo -> GitHub transition funded yet?)

Something similar to the Django fellow, funded for a year as a test would be huge. We only then need to sell the larger story of Python being impactful to biomedical research, which is obviously true. I know there are other groups that have applied for this grant for this type of funding in the past, so I can definitely help share that knowledge from the Django & historical CZI grant side. 👍

brettcannon commented 4 years ago

Is the bpo -> GitHub transition funded yet?

Yes, it's funded. We are going through job applicants for the PM position now and we have an initial list of candidates to join the WG to manage the transition.

warsaw commented 4 years ago

Accelerate work on HPy. Improve mypy -- it may be losing much of its corporate funding, and we need to upgrade the type system to support numpy.

I'll put in votes for these.

gvanrossum commented 4 years ago

And what are your own brainstorm ideas?

freakboy3742 commented 4 years ago

@brainwane

Here's an attempt at a pitch for why "mobile python" is of interest to science/biomed


Python is already a well established tool in science and biomedical research, where it is used for data analysis, visualization, machine learning and pattern recognition. Scientists and researchers also use Python to develop and maintain complex database-backed websites in support of their research goals. However, this usage is largely restricted to laptops and servers. The most widely used types of computing device - phones and tablets - are not currently well served by the Python ecosystem.

At present, developing apps for phones and tablets generally requires specialist skills, and often requires mastering multiple programming languages. As a result, developing a mobile application to support their research isn't an option available to most researchers. A "mobile enabled" Python would enable scientists and researchers to leverage their existing programming skills to develop mobile applications. These applications could simply make existing data analysis and visualization techniques available on new platforms; or they could combine these techniques with the unique capabilities of mobile devices, such as photo capture, geolocation, and augmented reality. This would open up new opportunities for in-the-field data gathering and analysis that hasn't been previously possible.


If I was to pitch potential projects for the grant with a mobile focus:

gpshead commented 4 years ago

On Sat, Jun 13, 2020 at 9:33 PM Russell Keith-Magee < notifications@github.com> wrote:

@brainwane https://github.com/brainwane

Here's an attempt at a pitch for why "mobile python" is of interest to science/biomed

Python is already a well established tool in science and biomedical research, where it is used for data analysis, visualization, machine learning and pattern recognition. Scientists and researchers also use Python to develop and maintain complex database-backed websites in support of their research goals. However, this usage is largely restricted to laptops and servers. The most widely used types of computing device - phones and tablets - are not currently well served by the Python ecosystem.

At present, developing apps for phones and tablets generally requires specialist skills, and often requires mastering multiple programming languages. As a result, developing a mobile application to support their research isn't an option available to most researchers. A "mobile enabled" Python would enable scientists and researchers to leverage their existing programming skills to develop mobile applications. These applications could simply make existing data analysis and visualization techniques available on new platforms; or they could combine these techniques with the unique capabilities of mobile devices, such as photo capture, geolocation, and augmented reality. This would open up new opportunities for in-the-field data gathering and analysis that hasn't been previously possible.

If I was to pitch potential projects for the grant with a mobile focus:

  • Officially integrating iOS and Android into the CPython build
  • Developing iOS and Android wheel formats that are compatible with mobile app distribution.
  • Developing a "Minimal Viable Python" (a minimal Python install, with standard library pieces being opt-in)

Want to use Python to create Android and iOS applications? Fund BeeWare https://beeware.org/. They already enable this!

Mobile environments are non-traditional, locked down with foreign language-runtime API surfaces that generally frown upon C and POSIX (for good reasons). Outside the experience of most CPython core devs and not an environment we'd collectively be easily able to support (meaning solving that for the long term would need to be part of the project). I don't believe there is even any need for pip and wheels there. That distribution model is effectively forbidden by the platform. Any application that wants to use Python must embed its own interpreter and fully control its environment. I don't expect there to ever be a "install this CPython app, download arbitrary code from the internet, pip install wheels, edit+run code on a phone" story for CPython. It'd suck. No application user would ever choose that so no-one trying to provide a mobile app for their users would ever force that 1-star review experience upon them.

Any goals around mobile support on the CPython side would make a lot more sense if BeeWare's pain points were used to drive prioritized requirements.

-G

gpshead commented 4 years ago

(and yes i realize my reply is going to... someone from the beeware project making those suggestions :)

On Sun, Jun 14, 2020 at 2:48 PM Gregory P. Smith greg@krypto.org wrote:

On Sat, Jun 13, 2020 at 9:33 PM Russell Keith-Magee < notifications@github.com> wrote:

@brainwane https://github.com/brainwane

Here's an attempt at a pitch for why "mobile python" is of interest to science/biomed

Python is already a well established tool in science and biomedical research, where it is used for data analysis, visualization, machine learning and pattern recognition. Scientists and researchers also use Python to develop and maintain complex database-backed websites in support of their research goals. However, this usage is largely restricted to laptops and servers. The most widely used types of computing device - phones and tablets - are not currently well served by the Python ecosystem.

At present, developing apps for phones and tablets generally requires specialist skills, and often requires mastering multiple programming languages. As a result, developing a mobile application to support their research isn't an option available to most researchers. A "mobile enabled" Python would enable scientists and researchers to leverage their existing programming skills to develop mobile applications. These applications could simply make existing data analysis and visualization techniques available on new platforms; or they could combine these techniques with the unique capabilities of mobile devices, such as photo capture, geolocation, and augmented reality. This would open up new opportunities for in-the-field data gathering and analysis that hasn't been previously possible.

If I was to pitch potential projects for the grant with a mobile focus:

  • Officially integrating iOS and Android into the CPython build
  • Developing iOS and Android wheel formats that are compatible with mobile app distribution.
  • Developing a "Minimal Viable Python" (a minimal Python install, with standard library pieces being opt-in)

Want to use Python to create Android and iOS applications? Fund BeeWare https://beeware.org/. They already enable this!

Mobile environments are non-traditional, locked down with foreign language-runtime API surfaces that generally frown upon C and POSIX (for good reasons). Outside the experience of most CPython core devs and not an environment we'd collectively be easily able to support (meaning solving that for the long term would need to be part of the project). I don't believe there is even any need for pip and wheels there. That distribution model is effectively forbidden by the platform. Any application that wants to use Python must embed its own interpreter and fully control its environment. I don't expect there to ever be a "install this CPython app, download arbitrary code from the internet, pip install wheels, edit+run code on a phone" story for CPython. It'd suck. No application user would ever choose that so no-one trying to provide a mobile app for their users would ever force that 1-star review experience upon them.

Any goals around mobile support on the CPython side would make a lot more sense if BeeWare's pain points were used to drive prioritized requirements.

-G

bskinn commented 4 years ago

I know absolutely nothing about the complexities of developing for mobile, but FWIW I can freely pip install in the Python that comes with Termux, for Android.

Not everything works, but everything I've tried will install.

Further, as long as I pkg install clang (and maybe a couple of other of Termux system packages) first, I'm able to pip install coverage and get the compiled C extensions.

So, maybe this is irrelevant to the current conversation, but superficially it seems to argue against @gpshead's position.

freakboy3742 commented 4 years ago

@gpshead I completely agree that the workflow for using Python on a mobile device would be different. However, that doesn't mean that wheels aren't possible, or wouldn't be useful. If anything, they're more useful on mobile platforms because of the way mobile apps need to operate.

Going into specifics will likely rathole this entire discussion. I'm happy to elaborate; let me know where the better place would be.

Suffice to say that the three points I listed are the pain points that BeeWare has with CPython at present. At the core of all of them is elevating "interpreter embedded in an app sandbox" as a first-class distribution story for Python - i.e., a Python installation story that doesn't include or involve python.exe. And while that story is the only way Python works on mobile, it's also a useful story for standalone app distribution on desktop platforms.

gpshead commented 4 years ago

On Sun, Jun 14, 2020 at 3:53 PM Russell Keith-Magee < notifications@github.com> wrote:

@gpshead https://github.com/gpshead I completely agree that the workflow for using Python on a mobile device would be different. However, that doesn't mean that wheels aren't possible, or wouldn't be useful. If anything, they're more useful on mobile platforms because of the way mobile apps need to operate.

Going into specifics will likely rathole this entire discussion. I'm happy to elaborate; let me know where the better place would be.

Suffice to say that the three points I listed are the pain points that BeeWare has with CPython at present. At the core of all of them is elevating "interpreter embedded in an app sandbox" as a first-class distribution story for Python - i.e., a Python installation story that doesn't include or involve python.exe. And while that story is the only way Python works on mobile, it's also a useful story for standalone app distribution on desktop platforms.

+1 anything that makes a first class distribution story out of embedding more usable without the current pains involved in setting up and maintaining that kind of thing would indeed be great. I expect there are multiple intersecting stories of exact needs based on what various platforms/environments expect or already provide or don't provide.

-gps

gvanrossum commented 4 years ago

My guess was that wheels for mobile platforms would be useful to app developers to assemble the bundle for their application, not for end users.

freakboy3742 commented 4 years ago

@gvanrossum Yes, that's the intended use case. End-user wheel installs may not even be possible. I believe Apple's App Store guidelines would reject an app that allowed the installation of wheels (and especially binary wheels) after the app has gone through review.

warsaw commented 4 years ago

Ideas:

brettcannon commented 4 years ago

Quick update: we discussed potential ideas to submit proposals for. We want to wait until after our next meeting next week when everyone is able to attend before we publicly state what we think the priorities should be.

Feel free to continue to discuss things until then.

brainwane commented 4 years ago

Thanks @brettcannon!

Project Funding WG has a work-in-progress list of funders. So if core developers and the Steering Council come up with some great ideas that aren't well suited to a CZI application, it's worth collecting those because some might be well-suited to a Mozilla Open Source Support Award application, or a Comcast Innovation Fund application, etc.

brainwane commented 4 years ago

Also, @brettcannon, I presume that once the Steering Council publishes the Vision Deck that was mentioned in a past update, that would be helpful for this discussion and related funding-seeking discussions. Will we be seeing that soon?

brettcannon commented 4 years ago

The vision deck was scrapped for plans to present at PyCon US on the topic, which then got dashed due to funding concerns for the PSF. I suspect, though, the list we present back to you all will encompass what we would have put in that document and presentation.

vstinner commented 4 years ago

I consider that the performance of the Python runtime matters to ensure that Python will remain relevant in 5 or 10 years. I identified three projects which are realistic:

brainwane commented 4 years ago

@brettcannon wrote on June 15th:

Quick update: we discussed potential ideas to submit proposals for. We want to wait until after our next meeting next week when everyone is able to attend before we publicly state what we think the priorities should be.

Feel free to continue to discuss things until then.

Thanks! Should we expect the Steering Council's public statement of priorities soon? In order to submit a good proposal by August 4th, I figure we need either of these 2:

  1. by ~July 7th: the Steering Council designates 1-2 things and says "let's apply for funding for these"
  2. by ~July 1st: the Steering Council says, more broadly, "here are some priority areas", and collaborates with core Python developers on python-dev and/or Discourse to narrow this down to 1-2 specific things by ~July 7th

(I'm saying "thing" instead of "project" because of ideas like "hire a person for a year to do general code review/issue wrangling" which is a fundable thing but not a project.)

brettcannon commented 4 years ago

Unfortunately we weren't able to get to this topic today as something more pressing came up and took up the whole meeting. I'll start an email thread to see what we can pull together.

And I am going to take from that last comment, @brainwane , that you are after a short list by July 7.

brainwane commented 4 years ago

@brettcannon thanks for the update.

And I am going to take from that last comment, @brainwane , that you are after a short list by July 7.

Yes, I think that would be great, thanks.

In case this helps: I have faith that, whatever the topic area is, as long as CPython maintainers want it and it's applicable to biomedical research, we can find a way to scope work and make it feasible and plausible as a proposal.

brainwane commented 4 years ago

@brettcannon should we expect a short list early this week? Thanks!

brettcannon commented 4 years ago

Just finished our meeting and the two things we would suggest proposing are:

  1. Core developer in residence; help with the PR backlog, issue triaging, improving the development workflow, etc.
  2. Single binary distributions, i.e. developing the tools necessary so you can compile CPython and all of your dependencies into a single binary and that's what you send people

Let us know if you need any more clarifications on those.

brainwane commented 4 years ago

Thanks @brettcannon -- very helpful!

We now have three weeks to try to find proposal-writers, write the proposal, edit, get Board approval, and submit. I hope that the Project Funding Working Group's members can do a lot of the lifting on writing this proposal, but that's not certain. I'm going to close this issue now because the Steering Council has set its priorities for this, but anyone who's interested in writing even a few paragraphs about why this is important and how much work it would take, please reply to this issue to volunteer.

xmunoz commented 4 years ago

@warsaw @brettcannon

Make single-file app distribution story official. I'm not talking about zipapps like shiv which although great, still require unpacking in order to support shared libraries. I'm talking about something much more akin to PyOxidizer.

Is something like pants pex in the same vein as shiv? Looks like pex has the ability to generate python executables, but not sure if these executables fit the "first-class distribution story" described above.

warsaw commented 4 years ago

shiv is a modernization of pex. At my job, we were using pex for tarball distributions (not really single file executables, see below), but pex had lots of performance problems, mostly due (IMHO) to its backward compatibility requirements. shiv dropped all that, supporting only Python 3 and using modern techniques and libraries (e.g. importlib.resources instead of pkg_resources) to get good performance.

As nice as shiv is, I wouldn't classify it as a "single file executable". Both pex and shiv are fundamentally tarballs with a special shebang line that Python knows how to execute, but 1) you still have to have the Python binaries installed out of band; 2) you still have to unpack the tarball to be able to import extension module shared libraries (since dlopen() can only link to physical file system files).

I want something like what PyOxidizer does, where you don't have to install anything else out-of-band or otherwise, and you don't have to unpack anything the first time you run it. It's just an executable that you can ship around and users wouldn't even have to know it was written in Python!

brainwane commented 4 years ago

Thanks to @xmunoz for leading the writing of a grant proposal requesting that CZI support a core developer in residence for one year. The earliest we'll hear back on whether the PSF's proposal was accepted would be November and the earliest the project would start would be January 2021.

xmunoz commented 4 years ago

Sad news fam, our proposal was not accepted :( Screenshot from 2020-10-22 11-58-48