hpyproject / hpy

HPy: a better API for Python
https://hpyproject.org
MIT License
1.02k stars 52 forks source link

Improve the communication on the positive impact of HPy on the Python ecosystem #419

Open paugier opened 1 year ago

paugier commented 1 year ago

I worked on a presentation about HPy for the french "Calcul group".

The pdf of the presentation is here.

I wrote some notes about HPy mostly for me to prepare the presentation. When HPy 0.9 will be released and if I feel that it can be useful, I could share another version of this text (a bit more positive), for example on Numpy mailing list.

This version contains some parts that could potentially be taken into account to improve the project, so I'm going to open few issues here to share my (user) point of view.

I start with the section on communication that I just copy here. It could be better to understand my point to read the whole text.

Communication about HPy

Currently, the external communication on HPy is mostly done through the project README, the project website/blog and the project documentation.

Most of these documents target people working with the CPython C API and this technical documentation is IMO remarkably clear and educational. Unfortunately, HPy communication is less good to motivate maintainers of projets and end-users to support the project. There is a very interesting page HPy overview which nicely presents the motivation and goals. However, some high level points of view are IMHO missing:

There are also very few mentions of HPy in Python conferences and social media. Victor Stinner mentions the project in some of his talks, for example Python Performance: Past, Present and Future at EuroPython 2019 and Introducing incompatible changes in Python at PyCon US 2023. However, there won't be any talks centered on HPy at PyCon 2023 (except in the Language Summit).

It is interesting to compare the respective communications of HPy and of the Faster CPython project. The Faster CPython project presents very high ambitions (X5 speedup for CPython in few years) and explains its detailed plan through many channels, for example a talk by Mark Shannon at PyCon 2023. In contrast, HPy is very shy and conservative. It is repeated in the documentation that "HPy is still in the early stages of development" and that "there is still a long road before HPy is usable for the general public". This is partially true, but this is not very positive and motivating. The concrete consequences for the end-users are somehow hidden behind quite technical data.

However, with all the respect that I have for the people working on the Faster CPython project, a successful HPy project (i.e. most popular packages with extensions using HPy) would lead to a deeper and better improvement to the Python ecosystem. In particular, it would bring great speedup for several users, which will be able to use Python implementations really "5 times faster" than Python 3.10 for many real world applications. Having specialized interpreters for different tasks would be much better than one interpreter that still has to support its problematic legacy C API during years. Without improvements of the CPython C API and with the constrain of not degrading performance of extensions using the legacy C API, we start to know that the target "x5 faster" of the Faster CPython project is very ambitious. Thus, a successful HPy would actually help a lot the Faster CPython project.

hodgestar commented 1 year ago

@paugier Thank you for giving the talk and for writing all of this up. I agree that we could be a lot more upbeat. Much of the documentation was written a couple of years ago when HPy was just starting out. It definitely needs an update to match the current much more usable and mature status of the project.

Would incorporating the manifesto and the What needs to change and why help explain the purpose to HPy users / the broader community or is it still too technical?

paugier commented 1 year ago

These texts (manifesto and What needs to change and why) are nice and we can see if the current documentation can be improved with them.

But I also think that it would be good if a standard Python user starts to read the README, the doc and the website, s-he should think after the first 5 lines "It seems technical but I understand what it is about. This project is useful for me and my colleagues. I hope it is going to be successful and I understand how we can help."

So it seems to me that a paragraph or a note deliberately too simple and catchy would be useful. Something like:

mattip commented 1 year ago

One way to think about documentation is via the Diátaxis framework which

identifies four modes of documentation - tutorials, how-to guides, technical reference and explanation.

NumPy and others have adopted these categorization, and structured the top-level documentation categories loosely around them.

Help is welcome to think about better structuring the documentation, using Diátaxis or any other framework.

paugier commented 1 year ago

I did my seminar on HPy yesterday. I am quite happy with what it gave and I got very positive feedback. It means that it is indeed possible to communicate about HPy to a general audience full of Python developers that know nothing about the CPython C API or about alternative Python implementations.

A small detail: for this presentation, I finally completely avoided C examples. If I had to do it again, I would add few very very simple C examples because during the questions I had to explain things that would have been easier to explain with these examples (in particular pointers to PyObjects and explicit reference counting versus handles).

Regarding improving the website and documentation for simple Python users (that again know nothing about the CPython C API and alternative Python implementations), I think it could be useful to have a specific page for them in the website or in the documentation with some links at the beginning of the README, the website and the documentation.

Currently the first texts that are available in these pages are way too technical for simple Python users and even many maintainers of Python packages. In the Summary section of the README, we feel a real effort to explain things, but even this part is too technical ("GC instead of refcounting", "GIL", "binary stability", "API/ABI", consequences in terms of C developers/extensions and not in terms of end-users).

To better explain what I mean, I'm going to try to write something explaining HPy to my colleagues/friends using Python. I'm bad in English and I don't have time to really work on this text but I guess the ideas are going to be understandable for you HPy developers.

Introduction for Python users that don't know what is the CPython C API

The Python ecosystem is based on the CPython C API (i.e. on C functions to interact with the Python interpreter). Some of your favorite packages (in particular Numpy, Matplotlib, Scipy, Pandas, ...) contain extensions produced from C code using this API. Unfortunately, the CPython C API has deep technical issues, which block improvements of the ecosystem. On the one hand, it is very difficult to improve the performance of CPython (the reference Python implementation) and on the other hand, supporting the CPython C API is a nightmare for alternative Python implementation (like PyPy, GraalPython, RustPython, MicroPython, etc.) and completely blocks their usage because it leads to very bad performance for code using the CPython C API.

Blabla/data on how much better/faster the alternative Python implementations are and what it would change for the user to be able to use them.

HPy is a better API for extending Python in C. When most popular packages using the CPython C API will be ported to this new API, their wheels (what you usually install with pip install) will be compatible and efficient with CPython and all other alternative implementation supporting HPy (currently PyPy and GraalPy). It will become very easy for Python users to choose the most efficient Python implementations, which will be able to specialize themself for particular use cases. By relaxing the constrains of maintaining good performance for the legacy CPython API, it will become much easier to change some CPython internals to improve its overall performance.

We are now convince that there is a clean technical solution to reach this better state, however there is still a long road.

Blabla on what remains to be done, in particular evaluation of the amount of work to port the most popular packages. + it's good to motivate people to give some raw estimation of when python users could be able to start to feel the changes related to HPy. (For example the Faster CPython project mentions x5 faster in 2025!)

To foster this project and the associate big transition in the ecosystem, you can ...

End of the introduction

What I try to say is that it is possible to explain HPy from the point of view of Python users and with very few technical terms.

There is also the idea that the project behind HPy is to port most popular packages using the legacy API to HPy. I feel that it is not really explicit in the README/website/doc.

Note that such text could be filled with hyperlinks to more serious contents with deeper explanation.

mattip commented 1 year ago

Could you create a blog post at https://github.com/hpyproject/hpyproject.org with your presentations or with a link to your presentation?

That repo holds the code used to create hpyproject.org using nikola. I see it is missing a README, which might be nice to add...

I think a non-technical 10,000 meter view would be nice. PRs welcome.