prefix-dev / rattler-build

rattler-build is a universal package builder for Windows, macOS and Linux
https://prefix-dev.github.io/rattler-build
BSD 3-Clause "New" or "Revised" License
185 stars 38 forks source link

rattler-build incorrectly includes some files from libarrow when it is a host dependency #979

Closed vyasr closed 1 month ago

vyasr commented 1 month ago

I'm not certain whether this is a bug in rattler-build or something that the libarrow package is doing that is fundamentally incompatible with the specs of rattler-build, so please let me know if I should refile this issue upstream.

That said, here is what I currently observe. Using this recipe:

recipe:
  version: 1.0.0

source:
  path: .

outputs:
  - package:
      name: example
    build:
      script:
        content: "echo bar"
    requirements:
      host:
        - libarrow

I expect the resulting package to be a metapackage. However, instead what I observe when I unpack it is the following:

$ rattler-build build --experimental --no-build-id
$ cd output/linux-64
$ cph x example-1.0.0-hb0f4dca_0.conda
$ find example-1.0.0-hb0f4dca_0/share -type d -links 2
example-1.0.0-hb0f4dca_0/share/gdb/auto-load/home/nfs/vyasr/local/testing/rattler_build_tests/libarrow_gdb_file/without_cache/output/bld/rattler-build_example_1721617935/host_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib
$ ls $(find example-1.0.0-hb0f4dca_0/share -type d -links 2)
libarrow.so.1700.0.0-gdb.pyhost_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/libarrow.so.1700.0.0-gdb.py

Evidently a file from libarrow is being repackaged. I also see a bunch of paths related to libarrow inside the info directory, but I assume that is because the info is capturing information about the host environment in the build and therefore there's nothing to really worry about there. Please let me know if that seems incorrect too.

In the case of this file, my guess is that the cause of its inclusion is somewhere in the arrow-cpp-feedstock's activation script. I haven't dug much further, but my guess is that its usage of _la_placeholder="replace_this_section_with_absolute_slashed_path_to_CONDA_PREFIX" is somehow incompatible with paths/env vars that rattler-build is not expecting to be replaced and is resulting in a nested replacement preventing the file from being properly excluded from the list of files in the package. As I said above, though, I'm not sure if this is an issue in rattler-build or something that needs to be fixed in the arrow feedstock to make it compatible with rattler-build.

baszalmstra commented 1 month ago

Unfortunately conda-build exhibits the same behavior. I think this is something that should be fixed in the arrow-cpp-feedstock.

A workaround for this issue is to exclude these files from the package using the build.files section. This worked on my machine:

recipe:
  version: 1.0.0

source:
  path: .

outputs:
  - package:
      name: example
    build:
      script:
        content: "echo bar"
      files:
        include:
          - "**"
        exclude:
          - share/gdb/**/libarrow.so*-gdb.py
    requirements:
      host:
        - libarrow
wolfv commented 1 month ago

Maybe the activation script can refrain from creating these files if the CONDA_BUILD env var is set?

vyasr commented 1 month ago

Major point to rattler-build: it didn't even occur to me to test this with conda-build because I'm averse to how long it would have taken :joy: Thanks for verifying that.