nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.09k stars 632 forks source link

Document backwards compatibility guarantees for inclusion of modules from the standard library in binary builds. #15886

Open lukaszgo1 opened 10 months ago

lukaszgo1 commented 10 months ago

Created as a result of discussion in PR #15820. cc @CyrilleB79 @seanbudd

Is your feature request related to a problem? Please describe.

When creating binary version of NVDA, py2exe bundles only parts of Python's standard library which are used (either implicitly in NVDA, or explicitly by our dependencies / other parts of Python's standard modules). Add-on developers often rely on these modules being included, the most recent example being the fact that after migration to Python 3.11 ctypes.util is no longer a part of the binary distribution. It is worth pointing out that if only part of the module is not included, adding it from an add-on introduces several difficulties (ctypes.util will be used as an example below):

Describe the solution you'd like

  1. ctypes.util should be again bundled, just because there is no reason to make life of add-on developers harder
  2. More generally, our backwards compatibility guarantees should clearly state what is NV Access's policy about bundling of standard library modules in binary copies, so that add-on authors now what is expected breakage, and what should be reported

    Describe alternatives you've considered

    If point 1 from the above list is not implemented, the known removals of standard library modules should be documented in the change log.

    Additional context

    The particular removal which triggered this issue affects at least 3 add-ons, many more has not been yet tested with Alpha versions of NVDA, so impact may be much higher. As described bundling only part of the module in add-ons is difficult, and requires more than basic knowledge of Python's import system to be done properly.

seanbudd commented 10 months ago

I think it would be best to document that any dependencies should be bundled by add-on authors, as dependencies in NVDA core cannot be relied on.

lukaszgo1 commented 10 months ago

I'm sorry, but 'any dependencies' is not something which should appear in any serious documentation targeting developers. In general when writing Python applications it is clear that standard library is always included, and everything else has to be installed / bundled separately. NVDA breaks this convention, which is surprising for add-on authors. Are you planning to state that add-on developers should not rely on anything from the STDLIB being included, and they preferably should bundle everything they use? If so this will increase size of add-ons significantly, and will result in a severe duplication. May I also ask why you're ignoring the argument about adding missing part of standard libraries to NVDA being difficult? From the discussion of the add-ons list this is something which pretty often discourages developers from writing NVDA plugins. In my opinion it would be much better to be welcoming towards add-on developers, and if they request inclusion of some module for the standard library do so, unless the module is heavy, or otherwise difficult to bundle.

CyrilleB79 commented 10 months ago

@seanbudd, I can only strongly support everything @lukaszgo1 writes. Also cc @michaelDCurran for information and opinion.

First of all, I do not know if it is official or not. But all the components of Python's standard library included in NVDA have always been considered as being a de facto part of the API: add-on authors have always been using these functions without questioning and NV Access has never warned about any limitation in this usage.

The minimum of the minimum would be to announce the following (if it applies) as API breaking changes in the change log:

A more acceptable minimum would be to avoid removing such modules, except for a good reason: e.g. causing other bugs or incompatibility, technical difficulty to keep it in the bundle, etc. This recommendation more specifically applies for modules that are known to be used in existing add-ons, and even more specifically when add-on authors have done the effort to report this breakage (e.g. the case of ctypes.util).

At last, the risk identified by @lukaszgo1 are very real:

seanbudd commented 10 months ago

Ideally we would either include all of the standard python library, or document the diff between releases. Unfortunately there is no easy way to do either of these things that we know of.

We can consider including standard library submodules like ctypes.util on a case by case basis. Without being able to diff included submodules, we should be warning authors that these submodules cannot be relied upon and it requires testing and reporting if these submodules go missing.

lukaszgo1 commented 10 months ago

Ideally we would either include all of the standard python library, or document the diff between releases.

If we would be able to create the diff between releases then there would be no need to document anything - we just should see what modules were included ,and if they no longer are forcibly bundle them.

Unfortunately there is no easy way to do either of these things that we know of.

Creating a diff between releases should be doable, but scripting it is probably pretty time consuming. It sounds like a nice challenge though.

We can consider including standard library submodules like ctypes.util on a case by case basis.

It seems you have considered this particular case, and reached the conclusion that it should not be bundled.

Without being able to diff included submodules, we should be warning authors that these submodules cannot be relied upon and it requires testing and reporting if these submodules go missing.

We should also clarify what happens if someone reports that some module is missing. Using ctypes.util as an example, if one of my add-ons would be affected by its removal, and I wouldn't be an active contributor, that would be first and last time I reported anything, as that case was handled not very well from your side.

CyrilleB79 commented 10 months ago

@seanbudd you write:

We can consider including standard library submodules like ctypes.util on a case by case basis.

I am not personally impacted by the lack of ctypes.utils in NVDA 2024.1 for my add-ons.

But given the last arguments provided by @lukaszgo1 and the discussion in general, would you reconsider forcing the inclusion of ctypes.utils in NVDA 2024.1 given we know that it was used in some add-ons?

And more generally, in the future, forcing the inclusion of missing modules upon add-on author request when this module has been embedded in previous versions of NVDA?

This would allow to develop a diff script but still avoid authors to be impacted by the automatic selection of embedded modules by py2exe.

Whatever your choice, it would be nice to have a final decision on this topic, especially now that we have reached beta stage.

seanbudd commented 10 months ago

15820 was closed as it was fixed in the add-on by the only add-on author which reported the issue.

If including ctypes.util is still needed and requested by any other add-on authors we can easily re-include it.

lukaszgo1 commented 10 months ago

What I do not understand here is why standard library modules are treated differently to our internal API. When deprecating parts of it, we usually implement backward compatibility whenever feasible by default i.e. we do not start with the part of the API removed, and implement back compat code only when some add-on author contacts you and complains. It is clear that modules of the standard library cause their own set of challenges, since we cannot determine what has been removed automatically, however the fact that we cannot be perfect should not mean that we're not improving where we can. Since we know that at least part of the std lib was included and no longer is, it should be included on the same principles as backward compatibility with __getattr__ is implemented i.e. to minimize issues for add-on authors. There is also an issue of people who would override compatibility of add-ons when 2024.1 comes out. I understand that you cannot be responsible for any issues in that case, but since you have decided (which IMHO was a huge mistake) that users can override compatibility, you also should try not to break it where possible. There always be more add-on developers who will decide not to contact you, and make their add-ons work with some hack or other (the add-on developer who 'fixed' the problem in their code just bundled entire ctypes overriding the copy including in NVDA by default). Note that I have started this discussion to have a decision on a general policy, yet it seems it has been mostly focused on ctypes.util. which, while understandable as having an actual example is helpful, should not mean that if this particular part of library is included / excluded this issue can be closed.

CyrilleB79 commented 10 months ago

I am probably the one who made evolve this discussion more on the inclusion of ctypes.util, rather discussing the general topic; sorry for that.

@seanbudd, again I can only second and try to insist on @lukaszgo1 arguments:

I understand that we cannot document it completely because it's technically less easy than changing a submodule version. But we should at least guarantee a best effort. And today we are far from this.

I acknowledge that some add-on authors have found a workaround to compensate for the lack of ctypes.util. Still we know that this module is missing and it's not documented (on contrary to removed dependencies or NVDA core modules). Thus, if an add-on author has not read this issue, nor the thread on groups.io, he may face an issue that has already been solved.

It's important for add-on authors that the management of Python standard libraries be clarified. And IMO, this should be done now that add-on author update their add-ons for 2024.1, not after this release.

nvdaes commented 10 months ago

i agree with cyrille last comment.

XLTechie commented 10 months ago

I agree with the concerns and comments by @lukaszgo1 and @CyrilleB79. The current handling of this matter is quite un-ideal for add-on authors. It will probably ultimately cause problems on the user side, if we leave this for add-on authors to hackishly solve.

I've been trying to find a way with py2exe, to get it to dump a list of every (sub)module it includes in a build, but I generally use pyinstaller for my packages so I am far from skilled with py2exe internals.

XLTechie commented 10 months ago

Could diffing the library.zip possibly give us a start to determining what is included across versions? At least at a certain level of granularity? Doing so certainly reveals the ctypes.util removal between 2023.3 and some recent alpha.

Here's a Linuxish bash scriptwhich does a reasonable job.

Here's a sample of its output, for anyone who doesn't want to run it.

This is far from ideal by any means, and I did not check the modules included outside library.zip, but perhaps it sparks an idea for someone.

Adriani90 commented 10 months ago

also ideas from other long term addon authors are welcome. cc: @LeonarddeR, @ABuffEr, @josephsl, @davidacm, @cmillsap, @sulaiman-alqusaimi, @Kostya-Gladkiy, @mltony, @abdel792, @RPTools-org, @rob-aph, @albzan, @NSoiffer, @javidominguez, @jmdaweb, @mohammad-suliman, @ibrahim-s, @fawazar94, @grisov, @ajborka @larry801, @ruifontes, @Tseng, @aaclause, @mhameed, @derekriemer, @francipvb, @beqabeqa473

jmdaweb commented 10 months ago

Thanks for the mention, @Adriani90. For years, I've been considering an idea and I think this issue is the right place for sharing it. Currently, we have various add-ons which bundle the same libraries with no differences, only a few differences, and entirely or partially. This takes extra space on disk and is a source of potential bugs. Let's remember that sys.modules must be taken into account in addition to sys.path. When a library is imported, even if its path is removed afterwards, subsequent calls to import that library will make use of the first one. Given this scenario, I would go beyond the standard library. A robust dependency management system should be built for add-ons, similar to those provided by pip, npm, apt on Debian, etc. In a first stage, a new type of add-on should be available in addition to app modules, global plugins, braille display drivers, synth drivers and vision enhancement providers: library. This would be loaded just after core, but before any other component. This would make possible for authors to install a library and import it without changing sys.path, and removes or delays the need of documenting all changes in standard library. Of course, there are multiple security drawbacks, and a bad use of this system replacing standard modules would cause unwanted behaviours, but I see a lot of advantages. After that, add-on authors should have a mechanism (for example in add-on manifest) to specify library requirements for their add-ons. What do you think? CC: @hxebolax

francipvb commented 10 months ago

Hello all,I was thinking about this problem and I don't understand why the full standard library is not included by default, excluding some big and unused parts like TK. If NVDA is just an end-user, non-extensible tool I understand, but with extensibility in mind, this is not understandable.About external, I would prefer to leverage existing toolset for package management and python per-addon runtime, so the library author can import anything he want to import, and after the addon is unloaded the library loads goes out.I understand that just the first propose is practical as of now, but consider the next solutions at the long-term roadmap.El 16 dic 2023 07:27, José Manuel Delicado @.***> escribió: Thanks for the mention, @Adriani90. For years, I've been considering an idea and I think this issue is the right place for sharing it. Currently, we have various add-ons which bundle the same libraries with no differences, only a few differences, and entirely or partially. This takes extra space on disk and is a source of potential bugs. Let's remember that sys.modules must be taken into account in addition to sys.path. When a library is imported, even if its path is removed afterwards, subsequent calls to import that library will make use of the first one. Given this scenario, I would go beyond the standard library. A robust dependency management system should be built for add-ons, similar to those provided by pip, npm, apt on Debian, etc. In a first stage, a new type of add-on should be available in addition to app modules, global plugins, braille display drivers, synth drivers and vision enhancement providers: library. This would be loaded just after core, but before any other component. This would make possible for authors to install a library and import it without changing sys.path, and removes or delays the need of documenting all changes in standard library. Of course, there are multiple security drawbacks, and a bad use of this system replacing standard modules would cause unwanted behaviours, but I see a lot of advantages. After that, add-on authors should have a mechanism (for example in add-on manifest) to specify library requirements for their add-ons. What do you think? CC: @hxebolax

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

josephsl commented 10 months ago

Hi all,

Short of renaming this topic to talk about Python standard library (as this is what we're focusing on at present), we need to remember that Python standard library is used by NVDA API (stable or not). We do talk about Python standard library because NVDA is mostly written in Python with C++ employed in some cases. As noted earlier, when using py2exe to bundle modules into a binary distribution, only Python standard library modules used by NVDA code base (directly or not) gets included, reducing bundle size and internet bandwidth (for some). In other words, what NV Access is focusing on (and what most add-ons focus on) is NVDA API most of the time - we teach NVDA code base when discussing add-ons, not Python standard library excpet when library modules are useful for add-ons or needed by authors.

We must also remember that NVDA includes several third-party modules in addition to parts of Python standard library. The best example is wxWidgets in the form of wxPython, which may conflict with use cases of tkinter/tk. The tkinter module is included in Python to provide a graphical user interface environment for Python programs, and tkinter is not the only GUI framework employed by Python programs - PyQT, wxPython, to name a few. Also, unless things have changed, tkinter-based programs are known to ship with inaccessible GUI, which defeats the purpose of screen reader add-ons with GUI included.

To address a few points:

  1. Provide notes on Python standard library changes in year.1 releases: yes and no. Yes, as it tells add-on authors and observers about which functionality can be used via standard modules and third-party dependencies included in NVDA (the key phrase is "included in NVDA"). No because NVDA may employ previously excluded Python standard library features throughout the year, and documenting this can overwhelm the community.
  2. ctypes.util: as folks pointed out, several add-ons rely on ctypes.util module either directly or because an add-on dependency may require it. People already talked about advantages and drawbacks of having add-on authors bundle the entire ctypes module just to use ctypes.util. My worry is dependency hell coming from pyd files due to possible incompatible CPython API and behavior between minor Python 3.x releases; at the moment NVDA is running on Python 3.11.6, but I imagine 3.11.7 will be out by the time NVDA 2024.1 stable version is released. Similar to Python 3 transition in 2019, we must be willing to take new Python minor releases between NVDA releases into account, and that can have an impact on add-on authors bundling possibly incompatible pyd files.
  3. Bundling the entire Python standard library (and possibly dependencies used by many add-ons into NVDA Core): that depends on what NV Access prioritizes. At the moment NV Access's priority is NVDA Core and Python standard library modules employed to get NVDA to do its duty. Obviously, this means not all Python standard library will be included when py2exe bundles NVDA modules into a binary distribution. The API NVDA exposes to add-ons is what they will use (at least initially) to get things going, and by doing so, add-ons can work with parts of Python standard library included with NVDA (for example, when add-on authors import the gui module in NVDA, they are committing to possibly using wxPython features, the GUI framework used by NVDA). Bundling the entire Python standard library (including tkinter) sends a message that NVDA is a general-purpose Python environment for use by add-on authors. However, that's NOT the message coming from NV Access as evidenced by current NVDA binary distribution practices and the material it generates: NVDA is a domain specialist, and anyone wishing to develop or use NVDA for tasks in addition to or as an alternative to its primary mission of being an accessibility information processor are advised to create, distribute, and use add-ons. This returns us to the question of compatibility and assumptions by add-ons community - that NVDA will include Python standard library employed by add-ons in perpetuity when it may not be so as evidenced by ctypes.util debate.
  4. A library add-on category that hosts dependencies: the success of this library add-on category will depend on: support for older and newer Python releases, not introducing duplicates, informing the community about add-on releases with detailed notes on modules included and possible side effects (hopefully with links to actual dependency documentation), and willingness to keep up with source dependency updates. Even with the introduction of a library add-on, add-on authors may choose to bundle dependencies themselves.

In summary, I see our current discussion as highlighting an ongoing tension between NV Access, add-ons community, and NVDA users over the scope of NVDA itself, its API, and how Python modules figure in our discussion. My overall position is that NV Access, add-ons community, and users must remember that NVDA is, first and foremost, an accessibility information processor, not a general-purpose productivity tool. Through add-ons, we have reconfigured our relationship with a screen reader to believe that NVDA can solve all sorts of problems with help from humans (add-on developers) and machines (code run by add-ons). One may say that add-ons do help NVDA become a more sophisticated information processor with the ability to handle and present timely, relevant, and context-aware information, and that sentiment was materialized through NV Access add-on store. If you think about it carefully, add-ons are also domain specialists - it's just that our interaction with various add-ons led us to think that NVDA is a general-purpose tool when in fact we've been talking to domain experts all this time. The question of bundling add-on dependencies and documenting changes about them (including parts of Python standard library) can be summarized thus: how do we get a number of domain specialists to talk to each other, specifically what are commonalities these domain experts share i.e. needed dependencies.

Thanks.

zstanecic commented 10 months ago

Hi Cyrille,

As you pointed out about 3.11.7, we already have it out.

Recently I saw it published on the python website.

Thanks,

Zvonimir

-- Ta wiadomość e-mail została sprawdzona pod kątem wirusów przez oprogramowanie antywirusowe Avast. www.avast.com

CyrilleB79 commented 9 months ago

Thanks all for commenting.

If we want to keep this discussion focused and productive, I'd remind that this issue is about describing and documenting what can or cannot be done and expected with each version of NVDA today.

For future changes on what should be included in the future, we may open a new issue, or comment in an existing one on this topic if it has already been opened.

mltony commented 9 months ago

My two cents: as an add-on writer I would prefer that the entire standard Python library be included in NVDA distribution. I often find myself reimplementing things that are available but excluded. Examples: thread pool, Future. Both of these belong to package concurrent. The only drawback would be increasing binary size of NVDA distribution by perhaps a few dozen MB, maybe at most 100MB. I would argue that's negligible amount and even in the poorest countries in the world this won't be a significant barrier to people downloading and using NVDA. But this way you'll help many add-on writers a great deal. I haven't found an easy way to convince py2exe to include all available packages. But maybe here (or in another issue) we can ask add-on devs to provide the list of packages they're missing? I personally vote for concurrent package.

lukaszgo1 commented 9 months ago

The only drawback would be increasing binary size of NVDA distribution by perhaps a few dozen MB, maybe at most 100MB. I would argue that's negligible amount and even in the poorest countries in the world this won't be a significant barrier to people downloading and using NVDA.

Just for reference, last time I had to download NVDA in a quite rural area of Poland (which is probably not the worst example you can find) the current 33 MB file took 20 minutes to download, when increased by 100 MB I would have to spent an hour and a half downloading. I imagine in some parts of the world an uninterrupted internet connection may not be even available for that long. In short I'd be very careful with increasing size of the launcher by a significant amount.

I personally vote for concurrent package.

Already included since NVDA 2023.1 :-)

Adriani90 commented 9 months ago

see also #3328 which shows that even for updates which are smaller people were expecting faster downnloading. However, at least for that part we could provide a diff binary as proposed in that issue. bsdiff or xdelta are probably the best tools for reducing patching / update size. Is it actually possible to provide the binary as it is now but apply a patch with the libraries during the installation? Then there qould basically be 2 binaries, one without libraries and the other one as a diff binary patch with reduced size.