pypa / flit

Simplified packaging of Python modules
https://flit.pypa.io/
BSD 3-Clause "New" or "Revised" License
2.15k stars 131 forks source link

building reproducible tarballs #542

Open bollwyvl opened 2 years ago

bollwyvl commented 2 years ago

Over on ipython, @Carreau has been using the retar script to post-process sdist tarballs to be SOURCE_DATE_EPOCH-aware.

As it has no dependencies, so if all the licensing is copacetic, what about adopting that behavior to complement the whl-based ones in flit?

Noted on https://github.com/jupyterhub/team-compass/issues/502#issuecomment-1103659726

takluyver commented 2 years ago

It should be more or less reproducible anyway - we're using SOURCE_DATE_EPOCH and normalising file ownership and permissions when creating the sdist. But I don't think it's particularly easy to test this automatically (because you want to check that the results are the same across things like different platforms), so there may well be inconsistencies that have crept in. Fixes welcome!

https://github.com/pypa/flit/blob/048c87c380ac41efc4b26222114e54f6581c64f6/flit_core/flit_core/sdist.py#L18-L34

https://github.com/pypa/flit/blob/048c87c380ac41efc4b26222114e54f6581c64f6/flit_core/flit_core/sdist.py#L167-L168

gitpushdashf commented 2 years ago

Looks like with SOURCE_DATE_EPOCH set, the tarball and wheel are both reproducible. And they match whether flit build or python -m build are used. Very nice!

bollwyvl commented 2 years ago

Very nice indeed, thanks for looking into it.

So perhaps all that's needed is a note about that, e.g.

Wheels built by flit are reproducible... wheels (which are zip files) include the modification...

amended to

Wheels and source distributions built by flit are reproducible... wheels (which are .zip archives) and source distributions (which are tar.gz archives) include the modification

Though have a re-build step on a different os/container might be interesting. I have found that windows has... problems.

takluyver commented 2 years ago

They won't be reliably reproducible between flit build and python -m build, because the former uses information from git or hg to decide what files to include, while the latter doesn't do that. There's more discussion about that discrepancy as part of #522.

pradyunsg commented 2 years ago

Note that you use one of those specifically (either build or flit), the distributions generated will be reproducible.

pradyunsg commented 2 years ago

Given that source tarballs built by flit are reproducible already, is there anything actionable here?

Update: yes, a documentation update. :)

pradyunsg commented 2 years ago

Ah, nvm me, I need to read things more carefully. 😅

nanonyme commented 1 year ago

I thought the main source of OS reproducibility issues was undefined file emitting order from directories which you just have to mitigate by sorting your input files by filenames.

takluyver commented 1 year ago

Yup, and we should be ensuring things are sorted, e.g.:

https://github.com/pypa/flit/blob/3f1ed8b932828a48c24a7f3fc72988e4e48b0f9e/flit_core/flit_core/common.py#L87-L94

https://github.com/pypa/flit/blob/3f1ed8b932828a48c24a7f3fc72988e4e48b0f9e/flit_core/flit_core/common.py#L442-L447