openSUSE / python-rpm-macros

Multi-Python, Single-Spec macros generator
Other
22 stars 20 forks source link

Better support reproducible builds #156

Open s-t-e-v-e-n-k opened 1 year ago

s-t-e-v-e-n-k commented 1 year ago

We are currently bad at reproducible builds, since we run install, then run fdupes, and that results in inconsistent timestamps in the pyc files versus the filesystem timestamp. My evil idea:

  1. Change %python_install and %pyproject_install to set the no bytecode enrvironment variable when calling install.
  2. Call fdupes inside the macro
  3. Then compile all pyc files

This should support existing spec files since calling fdupes a second time should result in no changes, and we can drop them when we get around to them.

mcepl commented 1 year ago

I wonder whether https://github.com/openSUSE/python-rpm-macros/pull/151 is involved somehow.

mcepl commented 1 year ago

@Vogtinator, @bnavigator, and @bmwiedemann might be interested as well.

Vogtinator commented 1 year ago

This should support existing spec files since calling fdupes a second time should result in no changes

There could be some edge cases like %fdupes -s converting hardlinks to symlinks, but I recommend against %fdupes -s anyway...

s-t-e-v-e-n-k commented 1 year ago

I think all python spec files now don't use %fdupes -s

mcepl commented 1 year ago

I think all python spec files now don't use %fdupes -s

Just for the record (none of these are much relevant for anything):

milic~/b/spec_factory$ rg -l 'fdupes -s' python-*
python-youtube-dl.spec
python-sip6.spec
python-pytest-expect.spec
python-pymediainfo.spec
python-OWSLib.spec
python-mkdocs.spec
python-livereload.spec
python-gphoto2.spec
python-ghp-import.spec
python-cangjie.spec
python-apipkg.spec
milic~/b/spec_factory$ 

However, it is really not much relevant for this discussion (delegated to Trello).

bnavigator commented 1 year ago
python-sip6.spec

That has %fdupes -s doc and is irrelevant for python

bmwiedemann commented 1 year ago

In the last years, timestamps are mostly not a problem for .pyc files, because rpmbuild normalizes mtime via %clamp_mtime_to_source_date_epoch Y and .pyc headers usually default to checksum mode with SOURCE_DATE_EPOCH set.

What is a problem are variations (e.g. from ASLR) that go into .pyc files. Because .pyc files are memory-dumps of internal python state, they are hard to make properly reproducible.

e.g.

> cat test.py
lext = {'png', 'gif', 'jpg', 'pcx', 'pnm', 'tif', 'xpm'}

> for i in $(seq 10) ; do setarch -R python3.10 -m py_compile test.py ; md5sum __pycache__/test.cpython-310.pyc 
  done|sort -u|wc -l
10
Vogtinator commented 1 year ago

This is actually not directly about reproducible builds but about fixing a packaging issue: https://bugzilla.suse.com/show_bug.cgi?id=1207805

bnavigator commented 1 year ago

Bug is non-public

bnavigator commented 1 year ago

In the last years, timestamps are mostly not a problem for .pyc files, because rpmbuild normalizes mtime via %clamp_mtime_to_source_date_epoch Y and .pyc headers usually default to checksum mode with SOURCE_DATE_EPOCH set.

@bmwiedemann, is that also true for SLE/Leap? The py39 custom repos for 15.X builds always throw rpmlint errors because that one doesn't like the mtimes of the .pyc files. I ignore them.

mcepl commented 1 year ago

Bug is non-public

I am sorry about that, I tried to add you to the bug so it should be visible at least to you. Does it work?

bnavigator commented 1 year ago

I can read it now.

Taking @bmwiedemann's input into account, I suspect this is a SLE/Leap only issue for Python 3.6 generated .pyc files.

%#FLAVOR#_compile is already part of %#FLAVOR#_pyproject_install. No harm in adding %fdupes before that call, so that duplicate source files get the same mtime. Legacy %python_install would need to get a bit more attention.

Alternative suggestion: Add -n to the %fdupes calls. I think in 99% of the cases, where identical .pyc files get deduplicated are empty __init__.py files.

brjsp commented 1 year ago

Alternative suggestion: Add -n to the %fdupes calls. I think in 99% of the cases, where identical .pyc files get deduplicated are empty __init__.py files.

no — 99% of cases are identical files for opt-0 and opt-1 (which make rpmlint unhappy as there are usually a lot of them)

bnavigator commented 1 year ago

But those have always the same mtime and have never been a problem for reproducible builds due to deduplication.

Vogtinator commented 1 year ago

Yeah, the issue is only about *.py files getting replaced by hardlinks. Replacing .pyc files with hardlinks is fine, their mtime is not used.

s-t-e-v-e-n-k commented 1 year ago

Trying to come up with a proof concept here, but the problem is that if you run %fdupes and then use compileall, the hardlinks get replaced. Still trying to think of a good solution.

bnavigator commented 1 year ago

Really interested in your proof here. Because your last comment directly contradicts your concept of the initial post (https://github.com/openSUSE/python-rpm-macros/issues/156#issue-1600814151)

Barely any packages calls %fdupes before the python compile, and if it does this it is only fixable in the package specfile not in the python macros.

s-t-e-v-e-n-k commented 1 year ago

That was an evil idea, not a fiat accompli -- I'm still trying tings out, but I welcome your input.