jupyter-server / jupyter-scheduler

Run Jupyter notebooks as jobs
https://jupyter-scheduler.readthedocs.io
BSD 3-Clause "New" or "Revised" License
200 stars 23 forks source link

Jupyter Scheduler 2.10.0 Source Distribution tar built too large causing PyPI upload failure #558

Open andrii-i opened 1 week ago

andrii-i commented 1 week ago

Description

Jupyter Scheduler 2.10.0 Source Distribution initial upload failed due to PyPI source distributions size limits (~150 Mb) due to tar build being drastically larger in size vs before jupyter-releaser introduction.

Built distribution upload went through, npm upload did not as it's later in the script.

How to reproduce

Expected behavior

dlqqq commented 1 week ago

For reference, here is the relevant log excerpt:

WARNING  Error during upload. Retry with the --verbose option for more details.
ERROR    HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
         File too large. Limit for project 'jupyter-scheduler' is 100 MB. See
         https://pypi.org/help/#file-size-limit for more information.
andrii-i commented 1 week ago

Jupyter Scheduler 2.10.0 npm package is now available at https://www.npmjs.com/package/@jupyterlab/scheduler, Source Distribution is now available at PyPI https://pypi.org/project/jupyter-scheduler/2.10.0/#files.

Let's use this issue to track the need to understand why Jupyter Scheduler 2.10.0 Source Distribution tar was built too large causing PyPI upload failure and to prevent it happening in the next release.

jupyter_releaser issue on the topic: https://github.com/jupyter-server/jupyter_releaser/issues/592

krassowski commented 1 week ago

Try building jupyter scheduler PyPI source distribution locally with jupyter-releaser build-python, see its size (>100 Mb)

Out of curiosity, do you know why it produces so big a distribution? From a quick look it seems that you might be missing:

[tool.jupyter-releaser.hooks]
before-build-python = ["jlpm clean:all"]

in the pyproject.toml but that's just a guess.

andrii-i commented 1 week ago

@krassowski no. I've created https://github.com/jupyter-server/jupyter_releaser/issues/592 in jupyter_releaser repo to surface the problem and hopefully get some insight from jupyter_releaser contributors.

Thank you for the suggestion and generally for looking into this.

krassowski commented 1 week ago

Do you have the contents of the package built locally with jupyter-releaser build-python?

andrii-i commented 1 week ago

@krassowski yes, here it is https://www.dropbox.com/scl/fi/51y8zhsjeqx2jmsyg9mll/jupyter_scheduler-2.10.0.tar.gz?rlkey=9v5gafpayncj7831zfr4q4o00&st=svnlkhgx&dl=0 (153.9 Mb)

krassowski commented 1 week ago

It looks like it includes .yarn and node_modules directories which I am sure is responsible for a large portion of the size. It obviously should not be included. Also see https://github.com/jupyter-server/jupyter_releaser/issues/592#issuecomment-2478372873.

I think in addition jlpm clean:all you should also add:

[tool.hatch.build.targets.sdist]
artifacts = ["jupyter_scheduler/labextension"]
exclude = [".github", "binder"]

so binder directory gets excluded.

That said, I already see jupyter_scheduler/labextension in the tarball you shared and it, along node_modules should have been excluded by hatch because it is in your .gitignore.

So why does it include things from the git repo?

In the logs of check-release action (https://github.com/jupyter-server/jupyter-scheduler/actions/runs/11809051201/job/32898683727) I see that the releaser is reading configuration from package.json rather than from pyproject.toml. I wonder if this could be related:

build-python

--------------------------------------------------
Using default value for dist_dir: 'dist'
Using default value for python_packages: '['.']'
Using default value for help: 'False'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Running hooks for before-build-python
jupyter-releaser configuration loaded from package.json.

Also, that one does include the clean hook:

https://github.com/jupyter-server/jupyter-scheduler/blob/14c44518f9d48eb40eb6c20e063c275c84358698/package.json#L132-L143

Interesting. It looks like it did not use hatch at all?

krassowski commented 1 week ago

None of that helps yet: https://github.com/jupyter-server/jupyter-scheduler/pull/561

I went ahead and triggered a new check-release run on an unrelated project just to see if this is not a regression in the ecosystem (rather than a misconfiguration). Compare older run on variable inspector with the run triggered today and both result in 1.53 MB
of artifacts, so I do not think that this is a system-wide issue, but just a problem with configuration.

krassowski commented 1 week ago

I tried aligning the scheduler config with other repos using releaser in https://github.com/jupyter-server/jupyter-scheduler/pull/561 but nothing helped.

The thing is that jupyter-releaser does not do anything bespoke, it just runs pipx run build (here). It should not result in anything different from python -m build as used by the build action:

https://github.com/jupyter-server/jupyter-scheduler/blob/14c44518f9d48eb40eb6c20e063c275c84358698/.github/workflows/build.yml#L57

krassowski commented 1 week ago

Running pipx run build locally does not produce such a large tarball for me, just 3.6 MB.