lancedb / lance-spark

Spark integrations for working with Lance datasets
Apache License 2.0
5 stars 2 forks source link

Spark CI taking very long to execute #11

Closed jackye1995 closed 1 month ago

jackye1995 commented 1 month ago

Currently it take a long time (over 10 min) to build the lance repository Java library.

It seems like all the time is taken in compiling rust dependnecy during the rust-maven-plugin build process:

[INFO] --- rust:1.1.1:build (lance-jni) @ lance-core ---
[INFO] Working directory: /home/runner/work/lance-spark/lance-spark/lance/java/core/lance-jni
[INFO] Running: cargo build --target-dir /home/runner/work/lance-spark/lance-spark/lance/java/core/target/rust-maven-plugin/lance-jni
[INFO]    Compiling proc-macro2 v1.0.[94](https://github.com/lancedb/lance-spark/actions/runs/14438085850/job/40482565316?pr=10#step:7:95)
[INFO]    Compiling unicode-ident v1.0.18
[INFO]    Compiling quote v1.0.40
[INFO]    Compiling syn v2.0.100
[INFO]    Compiling libc v0.2.171
[INFO]    Compiling autocfg v1.4.0
[INFO]    Compiling cfg-if v1.0.0
[INFO]    Compiling memchr v2.7.4
[INFO]    Compiling bytes v1.10.1
[INFO]    Compiling libm v0.2.11
[INFO]    Compiling serde_derive v1.0.219
[INFO]    Compiling once_cell v1.21.3
[INFO]    Compiling allocator-api2 v0.2.21
[INFO]    Compiling foldhash v0.1.5
...

Although the cache is set up to (home dir is /home/runner/work/lance-spark/lance-spark):

      - uses: Swatinem/rust-cache@v2
        with:
          workspaces: lance/java/core/lance-jni

This seems to be not effective.

I also tried the other one:

      - name: Cargo cache
        uses: actions/cache@v4
        with:
          path: |
            ~/.cargo/bin/
            ~/.cargo/registry/index/
            ~/.cargo/registry/cache/
            ~/.cargo/git/db/
            ~/lance/java/core/target/rust-maven-plugin/lance-jni/
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

Also did not get much improvement.

I have the PR #10 opened to continue testing this with new configs and exploring any other good ways. Would appreciate any other suggestions if anyone knows a better way.

jackye1995 commented 1 month ago

Figured out a path forward in https://github.com/lancedb/lance-catalog/pull/19