pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.94k stars 1.71k forks source link

`jsonpath_lib_polars_vendor` is not cached by Swatinem/rust-cache@v2 #14794

Open eitsupi opened 4 months ago

eitsupi commented 4 months ago

After updating to polars 0.38.0, I noticed that it now takes longer to build on GitHub Actions.

This also occurs in this repository, but the new dependency, jsonpath_lib_polars_vendor, does not seem to be cached. Do you have any idea what the cause might be?

https://github.com/pola-rs/polars/actions/runs/8101996808/job/22143496368#step:5:16

Run cargo test --all-features -p polars --test it --no-run
    Updating crates.io index
   Compiling polars-utils v0.38.0 (/home/runner/work/polars/polars/crates/polars-utils)
   Compiling polars-error v0.38.0 (/home/runner/work/polars/polars/crates/polars-error)
   Compiling polars-arrow v0.38.0 (/home/runner/work/polars/polars/crates/polars-arrow)
   Compiling polars-compute v0.38.0 (/home/runner/work/polars/polars/crates/polars-compute)
   Compiling polars-core v0.38.0 (/home/runner/work/polars/polars/crates/polars-core)
   Compiling polars-ops v0.38.0 (/home/runner/work/polars/polars/crates/polars-ops)
   Compiling jsonpath_lib_polars_vendor v0.0.1
   Compiling polars-plan v0.38.0 (/home/runner/work/polars/polars/crates/polars-plan)
   Compiling polars-pipe v0.38.0 (/home/runner/work/polars/polars/crates/polars-pipe)
   Compiling polars-lazy v0.38.0 (/home/runner/work/polars/polars/crates/polars-lazy)
   Compiling polars v0.38.0 (/home/runner/work/polars/polars/crates/polars)
   Compiling polars-row v0.38.0 (/home/runner/work/polars/polars/crates/polars-row)
   Compiling polars-json v0.38.0 (/home/runner/work/polars/polars/crates/polars-json)
   Compiling polars-parquet v0.38.0 (/home/runner/work/polars/polars/crates/polars-parquet)
   Compiling polars-time v0.38.0 (/home/runner/work/polars/polars/crates/polars-time)
   Compiling polars-io v0.38.0 (/home/runner/work/polars/polars/crates/polars-io)
   Compiling polars-sql v0.38.0 (/home/runner/work/polars/polars/crates/polars-sql)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 1m 48s
  Executable tests/it/main.rs (target/debug/deps/it-0f742ec08bf6da34)
ritchie46 commented 4 months ago

Maybe because it was compiled for the first time?

eitsupi commented 4 months ago

I don't think this is because it is the first build, as it appears to build every time, even though it has been built many times in the main branch. https://github.com/pola-rs/polars/actions/runs/8108721415/job/22162464498#step:5:16

@Swatinem Sorry for tagging you, but any thoughts on this?

Swatinem commented 4 months ago

Might be related to some build scripts, or there is a naming problem that my cache does not pick up correctly.

I would advise to enable --verbose cargo output, as well as actions debugging (my README should tell you how), and then take another look.

eitsupi commented 4 months ago

I would advise to enable --verbose cargo output, as well as actions debugging (my README should tell you how), and then take another look.

@Swatinem Thanks, I tried to do that in pola-rs/r-polars#879, but could not figure out how to set --verbose. Sorry.

Swatinem commented 4 months ago

I’m actually not sure if that would work, or if you would have to set that ENV somewhere in the github settings UI. I also can’t figure out where you are calling cargo build / cargo test in your workflow? It looks like hidden behind some other actions or scripts? Just doing a cargo build --verbose will show you the exact reason why cargo wants to rebuild something.

eitsupi commented 4 months ago

@Swatinem Thanks, I saw this:

        Dirty jsonpath_lib_polars_vendor v0.0.1: stale, unknown reason
   Compiling jsonpath_lib_polars_vendor v0.0.1

https://github.com/pola-rs/r-polars/actions/runs/8122433401/job/22201862255?pr=879#step:9:210

Swatinem commented 4 months ago

well, looks like cargo doesn’t know itself either. Its also surprising that it does not report any path directory. so is it a git or crates.io depedency?

eitsupi commented 4 months ago

It should exist on crate.io.

https://github.com/pola-rs/r-polars/blob/929759cecdd9e4b824ae3cdeb4b7fceab804438f/src/rust/Cargo.lock#L845-L854

[[package]]
name = "jsonpath_lib_polars_vendor"
version = "0.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4bd9354947622f7471ff713eacaabdb683ccb13bba4edccaab9860abf480b7d"
dependencies = [
 "log",
 "serde",
 "serde_json",
]

It comes from here.

https://github.com/pola-rs/polars/blob/baacf3dac35cae15b30e7208b38e5a02bca838b3/crates/polars-ops/Cargo.toml#L41-L44

eitsupi commented 4 months ago

@Swatinem Sorry to bother you, but would you take a look at this again? Thanks.

Swatinem commented 4 months ago

There isn’t really much I can say with whatever info there is. Cargo itself does not really provide any info. And I believe you haven’t enabled the actions debugging.

eitsupi commented 4 months ago

@Swatinem Thanks for your reply. Unfortunately, I think I have enable action's debug mode. See pola-rs/r-polars#879

eitsupi commented 3 months ago

I checked with a minimal example to see what was causing it, and found that renaming the lib made the cache work. (eitsupi/rust-cashe-test#1)

 [lib]
-name = "jsonpath_lib"
+name = "jsonpath_lib_polars_vendor"
 path = "src/lib.rs"
 crate-type = ["cdylib", "rlib"]

...to this file https://github.com/ritchie46/jsonpath/blob/653d3cb84217217dc925aded7cac6b3efd39c7f1/Cargo.toml

Details ```log Run cargo run --verbose Updating crates.io index Fresh unicode-ident v1.0.12 Fresh proc-macro2 v1.0.79 Fresh hashbrown v0.14.3 Fresh equivalent v1.0.1 Fresh quote v1.0.35 Fresh indexmap v2.2.5 Fresh syn v2.0.52 Fresh itoa v1.0.10 Fresh ryu v1.0.17 Fresh serde_derive v1.0.197 Fresh log v0.4.21 Fresh serde v1.0.197 Fresh serde_json v1.0.114 Dirty jsonpath_lib_polars_vendor v0.0.1 (https://github.com/ritchie46/jsonpath?rev=653d3cb84217217dc925aded7cac6b3efd39c7f1#653d3cb8): stale, unknown reason Compiling jsonpath_lib_polars_vendor v0.0.1 (https://github.com/ritchie46/jsonpath?rev=653d3cb84217217dc925aded7cac6b3efd39c7f1#653d3cb8) ``` to ```log Run cargo run --verbose ##[debug]Overwrite 'shell' base on job defaults. ##[debug]/usr/bin/bash --noprofile --norc -e -o pipefail /home/runner/work/_temp/5527326d-430d-42c3-b808-9b10a759367f.sh Updating crates.io index Fresh unicode-ident v1.0.12 Fresh hashbrown v0.14.3 Fresh proc-macro2 v1.0.79 Fresh equivalent v1.0.1 Fresh quote v1.0.35 Fresh indexmap v2.2.5 Fresh ryu v1.0.17 Fresh syn v2.0.52 Fresh itoa v1.0.10 Fresh log v0.4.21 Fresh serde_derive v1.0.197 Fresh serde v1.0.197 Fresh serde_json v1.0.114 Fresh jsonpath_lib_polars_vendor v0.0.1 (https://github.com/eitsupi/jsonpath?rev=870b05ca13cae6d1061d4d8fc4b70ea41dcaf420#870b05ca) ```

@Swatinem Is this a bug that can be solved in Swatinem/rust-cache? Or maybe it's a bug of cargo or something?

@ritchie46 Is it possible to rename the lib?

This repository only rebuilds one more crate, so there is no major impact, but in downstream projects that have polars as a dependency, polars itself will be rebuilt every time, which may increase the build time by several minutes. If this issue can be resolved, I think it's great. Thanks all.

Swatinem commented 3 months ago

I think this should be something thats fixable in the action, hopefully :-) Now I just have to find some time to actually do that :-D