open-source-ideas / ideas

💡 Looking for inspiration for your next open source project? Or perhaps you've got a brilliant idea you can't wait to share with others? Open Source Ideas is a community built specifically for this! 👋
6.59k stars 220 forks source link

dpkg trigger lib for python #215

Closed KOLANICH closed 4 years ago

KOLANICH commented 4 years ago

Project description

dpkg allows to register triggers so the apps get notified when a it does something relevant to them. Usually this is used to invalidate and update some caches. It is configured by writing special files to specific locations. Then dpkg calls the binaries configured to be called.

python has a built-in mechanism for plugins called entry points. one package can register them using a global id. Another one can discover all the entry points matching that id and call the functions registered.

So we want to allow python packages be subscribed on dpkg events. One of the ways to do it is a dispatcher package:

Another a more performant way is to generate triggers for specific packages when wheels are installed, but pip has no triggers at all.

Security

Since dpkg and its triggers are run from root we need to prevent this to be used for

Threat model

Goals:

We achieve this by storing the database of packages having triggers activated in the place unwritable and unreadable by anyone except root.

Relevant Technology

setuptools entry_points https://plumbum.readthedocs.io/en/latest/cli.html https://stackoverflow.com/questions/15276535/dpkg-how-to-use-trigger https://wiki.debian.org/DpkgTriggers https://sources.debian.org/src/dpkg/1.19.7/doc/triggers.txt/ https://manpages.debian.org/buster/dpkg-dev/deb-triggers.5.en.html

Complexity and required time

Complexity

Required time (ETA)

Categories

remram44 commented 4 years ago

I am a bit confused, do you want triggers to be executed for packages installed from wheels directly (not DEB packages)? Or Python libraries that have been turned into DEB packages and installed that way?

KOLANICH commented 4 years ago

I am a bit confused, do you want triggers to be executed for packages installed from wheels directly (not DEB packages)?

Yes.

Or Python libraries that have been turned into DEB packages and installed that way?

It is a bit problematic to package every python package into a deb and all other packages for package managers. It is far more convenient to use wheels. #50 is a very needed thing, as you see.

Also: the hooks provided via entry points not necessarily should be specific to dpkg, they can be translated into hooks for other package managers. So I just install the module for dpkg (a fedora user installs a module for dnf), install a package wanting a hook and allow this package and it works.

remram44 commented 4 years ago

It is not clear to me whether this needs to interact with dpkg/dnf. Couldn't those be "pip hooks"? It is not clear to me why you need Python or pip to translate those into hooks for dpkg so that dpkg runs them, rather than having pip run them directly.

Of course, if you follow that thought, it already exists in distutils (basically, functions you stick in setup.py), though I don't know if that works with wheels.

KOLANICH commented 4 years ago

Couldn't those be "pip hooks"?

pip triggers are also needed for the same purpose, but for pip. https://github.com/pypa/packaging-problems/issues/308

It is not clear to me whether this needs to interact with dpkg/dnf. Couldn't those be "pip hooks"? It is not clear to me why you need Python or pip to translate those into hooks for dpkg so that dpkg runs them, rather than having pip run them directly.

Of course, if you follow that thought, it already exists in distutils (basically, functions you stick in setup.py), though I don't know if that works with wheels.

No, it is different. setup.py is called when we build packages. Prebuilt wheels are just archives that are just unpacked. No untrusted code is executed on a wheel is installed.

The goal is the following. I have developed a python lib that allows to determine a package by a path of a file within it. python replacement for dpkg -S for dpkg. But requires no subprocess calls, so more secure. I need it for my new metabuild system for dependencies discovery because debhelper has a fatal flaw. no dpkg API bindings retrieving this info are available for python. Probably a better way would have been to write a parser for dpkg internal representation, but it is poorly documented and even I had reversed the format from the source, the resulting parser would have to be under GPL. Fortunately dpkg stores the needed info as plain text files too, so it is easy to recreate an own database with the same (probably except versions) info. There is an issue - the db takes ~ 30 MiB, I have to update it, and it takes time. Quite a lot of time, BTW, about a minute for full import.

remram44 commented 4 years ago

This seems like a very corner-case use case (you're making a package manager on top of pip), which might be better served by making your own change to pip. I am not sure if there are other use cases.

Isn't it easier to make a DEB package for your own library, and write the correct hooks yourself? Rather than implement this whole system with all kinds of security consideration, that might not be used by anybody else?

I agree than a hook system within pip to have packages notified when other packages are installed might be useful, though for specific cases (new package is aware of the other) it is easy to manually let it know (in the non-wheel case anyway), and again I don't think the use-case of writing your own package manager is something pip will be concerned about.

I have developed a python lib that allows to determine a package by a path of a file within it. python replacement for dpkg -S for dpkg. But requires no subprocess calls, so more secure. I need it for my new metabuild system for dependencies discovery because debhelper has a fatal flaw (???). no dpkg API bindings retrieving this info are available for python. Probably a better way would have been to write a parser for dpkg internal representation, but it is poorly documented and even I had reversed the format from the source, the resulting parser would have to be under GPL.

This paragraph is hard to parse and your link appears to be to an unrelated article ("A Brief History of Windows Programming Revolutions") but I am not sure why you wouldn't use dpkg-query? I am doing this successfully. It has been recommended to me by a Debian developer (https://github.com/VIDA-NYU/reprozip/issues/329). Also reading the internal database wouldn't place license requirements on you as far as I know (unless you re-use their code) and is not that slow even via Python, I have code for this here. I also have an apt-file like system for PyPI here.


In brief I don't want to be annoying but it seems to me that this is only useful for your own project and is unlikely to be of use to the wider community? And that project is a replacement for facilities that already exist as part of the system, but you don't want to use for unclear reasons?

KOLANICH commented 4 years ago

This seems like a very corner-case use case (you're making a package manager on top of pip)

Not quite.

Isn't it easier to make a DEB package for your own library, and write the correct hooks yourself?

It is, but it is an incorrect approach. It is obvious that the abstraction is needed. My programming experience is that: every time I was lazy enough not to implement the needed abstraction, it caused only suffering and in the end I had to implement it. Better early than late. I need this abstraction in multiple places, not only file2package mapper, but also for a tool doing some dirty hacks that mostly work.

This paragraph is hard to parse and your link appears to be to an unrelated article

It is just a source of the idiom fatal flaw. Just search for it in the text.

but I am not sure why you wouldn't use dpkg-query?

Because it would mean subprocess calls - the thing that must be avoided because of security and performance reasons. Security - subprocess calls having user-controlled args passed is a direct way to disaster. Performance - each time you call a program it means that a new process must initialize, read the state and only after that it can do something useful. In my approach the db is read from disk when my package is loaded and then all the searches are done within the same process.

Also reading the internal database wouldn't place license requirements on you as far as I know (unless you re-use their code)

  1. I don't see any way to implement it quickly without reusing their code in some manner. Quickly doesn't imply black box/clean room RE.
  2. Anyway, I currently has no time for that. So I have implemented a solution that could have been implemented fast - just parsing the bunch of text files. A better solution can be implemented later - the architecture partly allows it and will only need a slight tweak.

If you are going implement a dpkg db parser lib there is a tool that can be very handy.

slow even via Python, I have code for this here.

I am not sure that this bunch of files is that DB. dpkg -S works quite fast, but importing the info from these bunch of files takes about a minute in a python script. C should be faster but not radically. So I guess that dpkg has an own storage optimized for search and the bunch files are there for other reasons.

Here is the lib I have written, but it is probably not a replacement to dpkg-query. It only resolves file into package name and arch, no other metadata is returned.

In brief I don't want to be annoying but it seems to me that this is only useful for your own project and is unlikely to be of use to the wider community?

IDK. From the one side, if it hasn't been implemented yet, it is likely noone needs it enough. From the other side all the things widely used started this way. Anyway, if dpkg triggers are implemented, there should be a clean way to use them from python.

nhatkhai commented 3 years ago

Isn't it easier to make a DEB package for your own library, and write the correct hooks yourself? Rather than implement this whole system with all kinds of security consideration, that might not be used by anybody else? I LOVE THIS.

nhatkhai commented 3 years ago

Another a more performant way is to generate triggers for specific packages when wheels are installed, but pip has no triggers at all.

This may be use for my case to do post-install like the egg package. But I hope it going to be simple, and I'm not sure if I can convince the whole company to move from pip to ~ dpkg pip.

And do I got a hook for pip uninstall too?

KOLANICH commented 3 years ago

@nhatkhai, it would only create more mess. The issue is that it is easier when you only have to deal with 1 package that has to be triggered for 1 distro. If you have to deal with multiple packages intended to work in multiple distros, you will soon get too many packages to deal with and it would become easier to install 1 middleware package for pip, 1 package for the distro adding hooks to the manager used in it and then all the packages that need to be triggered from wheels with pip, without building distro-specific packages for each of them.

nhatkhai commented 3 years ago

Ouch...