mesonbuild / meson

The Meson Build System
http://mesonbuild.com
Apache License 2.0
5.63k stars 1.63k forks source link

[RFC] Simplified wrapdb #8754

Closed xclaesse closed 3 years ago

xclaesse commented 3 years ago

I have been thinking and experimenting a bit with ways to improve our wrapdb workflow.

Current issues:

My proposal: https://github.com/xclaesse/wrapdb

Example workflow:

TODO:

Needed redirects:

xclaesse commented 3 years ago

Also, some wraps are currently broken: https://github.com/mesonbuild/meson/issues/8737

jpakkane commented 3 years ago

Change meson wrap command to use new system.

What would this actually entail? Would it load its things from the wrapdb or directly from the Github url. The latter is not desirable as it is a big vendor lock-in think. Also we'll want to have a wrapdb web site as currently where people can browse available packages with a nicer UI than by spelunking inside Github repo pages.

xclaesse commented 3 years ago

Change meson wrap command to use new system.

What would this actually entail? Would it load its things from the wrapdb or directly from the Github url. The latter is not desirable as it is a big vendor lock-in think.

The idea to list/search wraps is to download that file https://raw.githubusercontent.com/xclaesse/wrapdb/master/releases.txt. We should setup a redirect to access that file through a https://wrapdb.mesonbuild.com URL of course (added a section about needed redirects).

Also we'll want to have a wrapdb web site as currently where people can browse available packages with a nicer UI than by spelunking inside Github repo pages.

I did not know we had a website for that, where is it?

jpakkane commented 3 years ago

Some random things that came to mind:

I did not know we had a website for that, where is it?

Well, ... errr .... https://wrapdb.mesonbuild.com/

Granted, it could be snazzier, but even that beats Github repo browsing.

xclaesse commented 3 years ago
* that `releases.txt` file should probably have the provides info also somehow so you can do "which package provides libjpeg" type queries and find both libjpeg and libjpeg-turbo

* maybe JSON so we can add other stuff to it later if a use case presents itself

I like the idea of having the [provide] section info available in that DB, it is something I've been thinking too, dependency('glib-2.0') could automatically tell you to run meson wrap install glib or even have a wrap_mode where it does it for you.

But to do that it means releases.txt would have to be generated by CI, but then...

What about:

Something like that?

{
  'glib': {
    versions: ['2.68.0-2', '2.68.0-1', ...],
    provide: ['glib-2.0', ...],
  },
  'jpeg-turbo': {
  }
}
* CI should run the validation checks in `mesonwrap` as a prerequisite to merging

Yes that's definitely something we should do. Wanted to first agree on the solution before going further, because current system does not validate neither, so that's not a regression anyway.

* Other endpoints like `meson wrap status` and `update` need to be supported

Yes, that's pretty easy to do, just need to take info from releases.txt instead of using wrapdb server API.

Well, ... errr .... https://wrapdb.mesonbuild.com/

Granted, it could be snazzier, but even that beats Github repo browsing.

Oh nice. Yeah, that's just a bit of HTML to expose the info from releases.txt if we extend it with json as suggested above. Wondering if that could be done with hotdoc somehow, to integrate better with mesonbuild.com.

tp-m commented 3 years ago

Something like that?

{
  'glib': {
    versions: ['2.68.0-2', '2.68.0-1', ...],
    provide: ['glib-2.0', ...],
  },
  'jpeg-turbo': {
  }
}

Detail probably, but please make sure that the format can theoretically handle these things changing over time (e.g. different versions might contain different provides).

xclaesse commented 3 years ago

Detail probably, but please make sure that the format can theoretically handle these things changing over time (e.g. different versions might contain different provides).

I initially thought about doing a map {version: provide} but I think it would be too verbose. IMHO it should be enough to only specify what the latest version provides.

dcbaker commented 3 years ago

I said it on IRC as well, but please use JSON (or TOML or YAML or …) not an ad hoc format called .txt (which usually means "A blob of text structured using natural language rules").

xclaesse commented 3 years ago

@dcbaker agreed, I'm changing that.

xclaesse commented 3 years ago

Made the change to a full json DB: https://github.com/xclaesse/wrapdb/blob/master/releases.json.

tristan957 commented 3 years ago

Should a version string be separated out into version and release #?

jpakkane commented 3 years ago

I created a new repo: https://github.com/mesonbuild/wrapdb_v2test

This should be used to set up all CI confs et al for development. Once everything there is working we can do the switch. You should have all the necessary permissions to set it up. If not, let me know and I'll add them.

xclaesse commented 3 years ago

I created a new repo: https://github.com/mesonbuild/wrapdb_v2test

Thanks, I ran the import script into that repo.

Notes:

stephanlachnit commented 3 years ago

Something like that?

{
  'glib': {
    versions: ['2.68.0-2', '2.68.0-1', ...],
    provide: ['glib-2.0', ...],
  },
  'jpeg-turbo': {
  }
}

Detail probably, but please make sure that the format can theoretically handle these things changing over time (e.g. different versions might contain different provides).

What about:

{
  'example-project': {
    provide: ['example-1.0', 'example-2.0', 'example-feature-2.0'],
    versions: {
      '2.1-1': {
        provide: ['example-2.0', 'example-feature-2.0'],
      },
      '2.0-2': {
        provide: ['example-2.0'],
      },
      '2.0-1': {
        provide: ['example-2.0'],
      },
      '1.0-1': {
        provide: ['example-1.0'],
      },
    },
  },
  'another-project': {
    provide: [...],
    versions: {...},
  },
}

The first provides lists all provides that appear in any version to make scanning for provides a bit easier (could also be removed technically), and versions still have their own provides section so that this can change over time. If it makes sense to do this is an entirely different question of course.

tristan957 commented 3 years ago

or versions: [{version: x, provides: y}]

jpakkane commented 3 years ago

CI on pull requests does some sanity checks to ensure the releases.json and wrap files seems correct.

Are these the same what mesonwrap review does? If not, they should be added, all of those checks are there because they were necessary.

Other than that, what does this still need for deployment? Does it need any changes in Meson itself?

Also, as an additional point, the readme says "DataBase", when the correct spelling would be "Database".

mayl commented 3 years ago

I'm quite excited to see this development. Meson's dependency management foundation is already quite strong, and I think improving the usability of the wrap-db part of that is going to be a great development.

Hopefully this is not too derailing of a question, but do you think one of the outcomes of this will be that it's easier to self host a private or internal wrap-db? I have use cases which we currently meet with [wrap-git] and [wrap-file] type dependencies but it would be nicer to be able to have users run meson wrap add against a self-hosted wrap-db vs manually searching for and copying .wrap files. I know I could self host the current wrap-db, but was always a little turned off by setting that up. If that simplifies down to some static hosting and a CI job I personally would find that very compelling!

Just 2c from a loyal Meson user, thanks for working on this I look forward to seeing where it ends up!

jpakkane commented 3 years ago

Having people run their own wrapdbs inside corporate firewalls is something we want to support. The main issues seem how to the collection of packages and the query API. The former is more work than the actual service API and the latter can not be done purely by static hosting (because otherwise you'd need to download the whole package list on a query).

mayl commented 3 years ago

I'm encouraged that running a private wrap-db is a desired use case!

Just a couple related thoughts before I get out of the way:

I'm not fully versed in the internals, but I believe APT works by downloading the package lists hosted here. Something on the scale of wrap-db might not be so bad to download a package list, especially if meson caches it.

There's also this pretty cool technique for using HTTP range to query statically hosted sqlite databases but with a fraction of the traffic you might otherwise expect. It'd be pretty cool to use that approach to allow static hosting of wrap-db's but I also understand that re-implementing all that in meson to work with python's built in sqlite is probably a non-starter.

xclaesse commented 3 years ago

I never thought private wrapdb would be desired, weird use-case... but why not...

FWIW, this proposal is based on a static json file you download and contains all releases info. Yes if we ever get millions wraps that won't scale, but it's much easier and even faster at our current scale. Currently the json takes 20k, really not a problem. If it gets too big, we could also gzip and cache it on disk (like APT). But most importantly, if we ever reach that level of popularity, I would expect to also have gained more contributors who could design and maintain a more advanced system like crates.io.

With server side query like our current wrapdb, I am more concerned about server scalability, currently all requests goes through a single python app on a single machine... With a static json DB we have all of github datacenter power to deliver that file to millions users. Our own server only need to 403 redirect all queries on wrapdb.mesonbuild.com.

It's out of scope for this issue, but part of the larger plan is to have dependency('something') tell you it's in wrapdb if not found on the system, caching the DB locally would be needed IMHO if we want to be able to query for each not-found dependency lookup.

mayl commented 3 years ago

Maybe a bit weird yes, but also I don't think I'm the only one :)

I'm glad to hear that the current proposal would work with static hosting. I think that sound simplest and agree with your take on the "scale-ability" issue. JSON also compresses very well - looks like the current database compresses down to 2.5k.

❯ du --apparent-size -h releases.*
19K releases.json
2.5K releases.json.gz
xclaesse commented 3 years ago

Are these the same what mesonwrap review does? If not, they should be added, all of those checks are there because they were necessary.

Didn't know about that command, I'll give it a look.

Other than that, what does this still need for deployment? Does it need any changes in Meson itself?

Yes, https://github.com/mesonbuild/meson/pull/8796.

Also, as an additional point, the readme says "DataBase", when the correct spelling would be "Database".

Fixed.

xclaesse commented 3 years ago

Are these the same what mesonwrap review does? If not, they should be added, all of those checks are there because they were necessary.

Didn't know about that command, I'll give it a look.

Done: https://github.com/mesonbuild/wrapdb_v2test/commit/56d54d6e4432bfaeef2c88becf37a553393e8cee