pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.24k stars 623 forks source link

"file changed as we read it" when taring files in Sandbox #19740

Open tgolsson opened 11 months ago

tgolsson commented 11 months ago

Describe the bug

Haven't isolated a reproduction, but caught this a few times when running "wide" commands after upgrading our repo to 2.17. For example, pants package :: lead to

/usr/bin/tar: src.container/binary-deps.pex/.deps/torch-1.12.0+cu116-cp39-cp39-linux_x86_64.whl: file changed as we read it

The tar command in question is a slightly modified version of the built-in Pants tar rule, with extra flags to remove timestamps and so on.

This is the target we're taring:

pex_binary(
    name="binary-deps",
    entry_point="runner",
    layout="packed",
    execution_mode="venv",
    include_sources=False,
    include_tools=True,
    interpreter_constraints=["==3.9.*"],
    dependencies=[
        "//cmd:runner",
    ],
    resolve="gpu",
)

I'm specifically eyeing the venv here which talks about symlinks, but I'm not sure if that has any effect on the sandbox itself...

Pants version

2.17

OS

Linux

Additional info

thejcannon commented 11 months ago

What's the behavior of this binary when a hardlink is made to the file while it's reading?

tgolsson commented 11 months ago

Oh, good clue. https://lists.gnu.org/archive/html/bug-tar/2016-04/msg00000.html. That's... a pain.

thejcannon commented 11 months ago

🙂 I only thought of it because we had issues with inotify doing something similar after the hardlink change rolled out.

thejcannon commented 11 months ago

If we fixed https://github.com/pantsbuild/pants/issues/18669, you could probably switch to the provided tar, right?

tgolsson commented 11 months ago

Yes. I'm using the same binary request though, so it suffers as well. https://github.com/tgolsson/pants-backends/blob/main/pants-plugins/oci/pants_backend_oci/util_rules/archive.py#L15

thejcannon commented 11 months ago

Looks like we can fix two issues in one PR 🙂

kingjamm72 commented 8 months ago

General observation. If the system is using hard links to update the sandboxes, this is modifying the mtime of the file. GNU tar uses the mtime to set whether the file has changed or not. For example, here is the before and during mtime information for the file that is 'changed' during the archive phase:

557141533 Dec 21 19:30 ./nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl

and

557141533 Dec 21 19:32 ./nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl

File size remains the same, but the mtime changes as the hard links are apparently created.

kingjamm72 commented 8 months ago

Note, moving to a .zip archive does not run into this issue, presumably because it is more tolerant (or badly behaved depending on perspective) to these types of changes.