lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.75k stars 401 forks source link

Include .lark files in package data #1308

Closed ptrcnull closed 1 year ago

ptrcnull commented 1 year ago

Fixes #1307

MegaIng commented 1 year ago

In comparison to the old setup.py file, it looks like *.md is a pattern that was included back then but isn't any more. Do we still need that?

https://github.com/lark-parser/lark/blob/7d9cfa6ed09a2b125e97e3a1810d9c6443f1718e/setup.py#L20

erezsh commented 1 year ago

@MegaIng I don't know. It looks okay in my builds, but apparently I can't rely on that.

MegaIng commented 1 year ago

If we want them to be included, they need to be added, even if just for documentation.

erezsh commented 1 year ago

Readme.md is already mentioned by name, and docs/* should take care of the rest.

Also, right now we don't include examples/README.rst, or any of the examples subfolders.

MegaIng commented 1 year ago

Readme.md is already mentioned by name, and docs/* should take care of the rest.

Are we looking at the same file? I don't see either of those included

erezsh commented 1 year ago

They are in MANIFEST.in

P.S. maybe we can just require setuptools-scm, and then we won't have to maintain MANIFEST.in as often.

MegaIng commented 1 year ago

Aha right.

I don't have experience with setuptools-scm, so no real opinion on that.

But if we are using MANIFEST.in, shouldn't we add the grammars there instead of as package data?

erezsh commented 1 year ago

@KOLANICH @henryiii Do you guys have an opinion about this? (manifest vs package-data vs scm)

KOLANICH commented 1 year ago
  1. In order to include the lark files tool.setuptools.include-package-data=true should be enough. tool.setuptools.package-data section is needed when it is false. When it is true, it captures all files in dirs having __init__.py automatically. In my projects I rely on this behaviour, and the files that shouldn't be within wheels are just out of the main package. Since it is already true, the section can be just purged.

  2. :+1: for using "setuptools_scm>7". 7 because since that version it can work with sdists generated from git repos.

  3. MANIFEST.in is relevant only to sdists. But when there is no -s or -w, build builds an sdist first, and then builds a wheel from it.

  4. about the readme, I suspect it may make sense to use the file as a readme, rather than hardcode a string into pyproject.toml.

  5. you cannot include files that are not a part of a package into wheels (except the ones that go into metadata automatically). And it shouldn't be done. I think the HTML docs and tests shouldn't be within wheels.

  6. I also propose to eliminate the manual enumeration of the subpackages (tool.setuptools.packages) and just use

[tool.setuptools.packages.find]
include = ["lark", "lark.*"]
KOLANICH commented 1 year ago
  1. Fix: setuptools_scm[toml]>7
henryiii commented 1 year ago

My preference is to use hatchling. :) That includes everything that is not gitignored by default, and can be configured easily in pyproject.toml without a separate MANIFEST.in file. It's also simpler and faster than setuptools.

For setuptools, package-data and MANIFEST.in are not interchangeable. You place things that go into the SDist into MANIFEST.in. You place things that go into the wheel in package-data. For simplicity, it will also be added to your MANIFEST if it's in package-data (since you need it in the SDist to get it in the wheel), but the question is, does it go in the wheel? If so, then use package-data. Otherwise use MANIFEST.in or setuptools_scm.

Not a huge fan of setuptools_scm's file finder. You can't turn it off if it's installed, so it affects every other package, even if they have no idea it exists. This doesn't affect isolated builds, of course, but it does affect systems like Spack where there's only one build environment. However, it's a very minor worry, and I do use setuptools_scm for versioning lots of places, which technically installs the file finder too (at least for now).

erezsh commented 1 year ago

Thanks everyone for weighing in. It was helpful.

It sounds like this change is correct and improves Lark, so I will merge this PR.

I think all of the suggestions you raised have merit, and will be more convenient. Hatchling sounds great, at least in theory. I would be happy to get rid of MANIFEST.in. But it also sounds like more work, and another risk of breaking people's builds.

Same goes for setuptools_scm, and this whole issue started because of its odd behavior. But, it might turn out to be an easy win.

Or we can leave things as they are, i.e. hopefully working but a little clumsy.

I don't know which choice is the right one, so if you feel strongly about it, you can open a new PR. If it makes our configuration shorter and easier to manage, I'll be happy to accept it.