lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.75k stars 401 forks source link

Wheels built using the new pyproject file don't include .lark files #1307

Closed ptrcnull closed 1 year ago

ptrcnull commented 1 year ago

Describe the bug When building lark 1.1.6 or later using a pep517-compatible tool, the resulting wheel does not include the .lark files in lark/grammars/

To Reproduce Build the package with, for example, build; then, install the resulting package and check lark/grammars/ ( or try using something that depends on common.lark )

Simple Docker reproduction example:

docker run --rm -it alpine:edge sh -c 'apk add py3-build py3-wheel py3-setuptools git unzip; git clone -b 1.1.7 https://github.com/lark-parser/lark; cd lark; python3 -m build --wheel --no-isolation; unzip -l ./dist/lark-1.1.7-py3-none-any.whl | grep lark/grammars'
erezsh commented 1 year ago

Hi, I am unable to reproduce this issue.

When I run python -m build --wheel --no-isolation, I get a .tar.gz and .whl files that both contains .lark files. The .lark files also exist in the official pypi package.

Maybe you can provide with more details? Do you have an idea for why your built packages are different than the official package?

ptrcnull commented 1 year ago

I tried doing the same in a Debian bookworm container, got the same results - the resulting wheel doesn't have .lark files; you can try the following Dockerfile (i suppose it looks better than a one-liner):

FROM debian:bookworm
RUN apt update
RUN apt install -y python3-build python3-setuptools git unzip
RUN git clone -b 1.1.7 https://github.com/lark-parser/lark
WORKDIR /lark
RUN python3 -m build --wheel --no-isolation
RUN unzip -l ./dist/lark-1.1.7-py3-none-any.whl
ptrcnull commented 1 year ago

To be fair, I'm not sure why it does work for you - the MANIFEST.in file doesn't include lark/grammar/ files, neither does any section in pyproject.toml

The only thing that comes to mind is setuptools changing something between 61.2.0 ( the minimum version specified in pyproject ) and latest, 68.0.0

erezsh commented 1 year ago

Your guess might be true.

The section that's responsible for the grammar files is this one:

[tool.setuptools]
packages = [
    "lark",
    "lark.parsers",
    "lark.tools",
    "lark.grammars",
    "lark.__pyinstaller",
]
include-package-data = true

We're including lark.grammars and adding the include-package-data flag. Makes sense to me that it works.

However, when I run python -m build, I get this message:

_BetaConfiguration: Support for `[tool.setuptools]` in `pyproject.toml` is still *beta*.

So it's very possible that we need to require a later version of setuptools.

ptrcnull commented 1 year ago

Ah, no, it's not the version of setuptools, i just tested building it with setuptools 61.2.0:

FROM debian:bookworm
RUN apt update
RUN apt install -y python3-venv git unzip
RUN git clone -b 1.1.7 https://github.com/lark-parser/lark
WORKDIR /lark
RUN python3 -m venv venv
RUN venv/bin/python3 -m pip install build setuptools==61.2.0
RUN venv/bin/python3 -m build --wheel
RUN unzip -l ./dist/lark-1.1.7-py3-none-any.whl

rather, it might be this:

OR, they are being tracked by a revision control system such as Git, Mercurial or SVN, and you have configured an appropriate plugin such as setuptools-scm or setuptools-svn. (See the section below on Adding Support for Revision Control Systems for information on how to write such plugins.)

from: https://setuptools.pypa.io/en/latest/userguide/datafiles.html

erezsh commented 1 year ago

Sorry, I'm not following. Can you explain more?

nekopsykose commented 1 year ago

if you have setuptools_scm in the current environment, setuptools somehow automatically uses it and somehow decides to include the .lark data files. they are then in the resulting .whl .

note that doesn't require even having the git repo- you can rm -rf .git and it still works. removing setuptools_scm from the import path breaks it again. there's probably some integration there that decides to keep them.

regardless, they are indeed missing from the manifest (or wherever they are meant to be specified, per that datafiles article)

nekopsykose commented 1 year ago

well, even removing scm later still actually works, it only doesn't work the first time.. some yet-additional weird caching takes place :D

if you build from a completely fresh environment (start a container, just clone the repo, run build, like the dockerfile example), you should be able to reproduce it. but the fix is as simple as including the files in package-data where they should be anyway.

erezsh commented 1 year ago

Got it. Would you like to submit a PR?

MegaIng commented 1 year ago

@erezsh #1308 does that.