pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.34k stars 1.86k forks source link

Inconsistent / wrong result on big enough data #11251

Open korommatyi opened 12 months ago

korommatyi commented 12 months ago

Checks

Reproducible example

I couldn't reproduce this issue on small dataframes, therefore I share my big dataframes, one is 300 million rows, the other 22 million rows. Base features: https://turbineai-my.sharepoint.com/:u:/g/personal/matyas_korom_turbine_ai/EStV-EcDvt9LmSkIU5I4q48BJWZjtjUsIFyMMHGFpZ0M2w complications: https://turbineai-my.sharepoint.com/:u:/g/personal/matyas_korom_turbine_ai/ETZRzJlktdpHmMVD_jdEZ9ABdt1pl6VzeEe5rwjeynDLRQ Please set the paths in the code snippet based on where you download the files.

Also, my result was non-deterministic, most of the time it displayed incorrect behaviour, but sometimes it produced the correct result, so you may need to run the code on the big dataframes a couple of times. Unfortunately, I also suspect that this is machine dependent and the result is incorrect only if the join result is big enough for polars to not keep everything in memory.

import polars as pl

def compute_complication_effect(
    base_features,
    complications,
):
    features = base_features.lazy().with_columns(
        pl.col("feature_1").alias("feature_1_base"),
        pl.col("feature_2").alias("feature_2_base"),
    )
    columns = features.columns

    features = features.join(
        complications,
        on=["experiment_id", "node_id"],
        how="outer",
    ).with_columns(
        pl.col("feature_1").fill_null(0.0),
        pl.col("feature_2").fill_null(1.0),
    )

    feature_1 = (
        pl.when(pl.col("change") == "set_constant")
        .then(pl.col("value"))
        .otherwise(pl.col("feature_1"))
    )
    feature_2 = (
        pl.when(pl.col("change") == "set_constant")
        .then(pl.col("value"))
        .when(pl.col("change") == "multiply_feature_2")
        .then(pl.col("feature_2") * pl.col("value"))
        .otherwise(pl.col("feature_2"))
    )
    features = features.with_columns(
        feature_1.alias("feature_1"),
        feature_2.alias("feature_2"),
    ).select(
        pl.col(*columns)
    )

    return (
        features.with_columns(
            pl.min_horizontal("feature_2", "feature_1").alias("feature_1"),
        )
        .select(
            pl.col("experiment_id", "node_id", "feature_1", "feature_2"),
            pl.col("feature_1_base").fill_null(0.0),
            pl.col("feature_2_base").fill_null(1.0),
        )
    ).collect()

print('results on big data')
base_features = pl.read_parquet('path_to_where_you_downloaded/base_features.parquet')
complications = pl.scan_parquet('path_to_where_you_downloaded/complications.parquet')
features = compute_complication_effect(base_features, complications)
print(base_features.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)))
print(complications.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)).collect())
print(features.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)))
print('='*30)

print('results on small data')
base_features = pl.DataFrame({
    'experiment_id': [0, 0],
    'node_id': [0, 1],
    'feature_1': [0.0, 0.2],
    'feature_2': [0.2, 1.0],
})
complications = pl.LazyFrame({
    'experiment_id': [0],
    'node_id': [550],
    'value': [1.0],
    'change': ['set_constant'],
})
features = compute_complication_effect(base_features, complications)
print(base_features.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)))
print(complications.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)).collect())
print(features.filter((pl.col('experiment_id') == 0) & (pl.col('node_id') == 550)))

This prints (on a bad day) the following.

results on big data
shape: (0, 4)
┌───────────────┬─────────┬───────────┬───────────┐
│ experiment_id ┆ node_id ┆ feature_1 ┆ feature_2 │
│ ---           ┆ ---     ┆ ---       ┆ ---       │
│ u32           ┆ u32     ┆ f64       ┆ f64       │
╞═══════════════╪═════════╪═══════════╪═══════════╡
└───────────────┴─────────┴───────────┴───────────┘
shape: (1, 4)
┌───────────────┬───────┬─────────┬──────────────┐
│ experiment_id ┆ value ┆ node_id ┆ change       │
│ ---           ┆ ---   ┆ ---     ┆ ---          │
│ u32           ┆ f64   ┆ u32     ┆ str          │
╞═══════════════╪═══════╪═════════╪══════════════╡
│ 0             ┆ 1.0   ┆ 550     ┆ set_constant │
└───────────────┴───────┴─────────┴──────────────┘
shape: (1, 6)
┌───────────────┬─────────┬───────────┬───────────┬────────────────┬────────────────┐
│ experiment_id ┆ node_id ┆ feature_1 ┆ feature_2 ┆ feature_1_base ┆ feature_2_base │
│ ---           ┆ ---     ┆ ---       ┆ ---       ┆ ---            ┆ ---            │
│ u32           ┆ u32     ┆ f64       ┆ f64       ┆ f64            ┆ f64            │
╞═══════════════╪═════════╪═══════════╪═══════════╪════════════════╪════════════════╡
│ 0             ┆ 550     ┆ 1.0       ┆ 1.0       ┆ 1.0            ┆ 1.0            │
└───────────────┴─────────┴───────────┴───────────┴────────────────┴────────────────┘
==============================
results on small data
shape: (0, 4)
┌───────────────┬─────────┬───────────┬───────────┐
│ experiment_id ┆ node_id ┆ feature_1 ┆ feature_2 │
│ ---           ┆ ---     ┆ ---       ┆ ---       │
│ i64           ┆ i64     ┆ f64       ┆ f64       │
╞═══════════════╪═════════╪═══════════╪═══════════╡
└───────────────┴─────────┴───────────┴───────────┘
shape: (1, 4)
┌───────────────┬─────────┬───────┬──────────────┐
│ experiment_id ┆ node_id ┆ value ┆ change       │
│ ---           ┆ ---     ┆ ---   ┆ ---          │
│ i64           ┆ i64     ┆ f64   ┆ str          │
╞═══════════════╪═════════╪═══════╪══════════════╡
│ 0             ┆ 550     ┆ 1.0   ┆ set_constant │
└───────────────┴─────────┴───────┴──────────────┘
shape: (1, 6)
┌───────────────┬─────────┬───────────┬───────────┬────────────────┬────────────────┐
│ experiment_id ┆ node_id ┆ feature_1 ┆ feature_2 ┆ feature_1_base ┆ feature_2_base │
│ ---           ┆ ---     ┆ ---       ┆ ---       ┆ ---            ┆ ---            │
│ i64           ┆ i64     ┆ f64       ┆ f64       ┆ f64            ┆ f64            │
╞═══════════════╪═════════╪═══════════╪═══════════╪════════════════╪════════════════╡
│ 0             ┆ 550     ┆ 1.0       ┆ 1.0       ┆ 0.0            ┆ 1.0            │
└───────────────┴─────────┴───────────┴───────────┴────────────────┴────────────────┘

Log output

join parallel: true
OUTER join dataframes finished
dataframe filtered
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
dataframe filtered
join parallel: true
OUTER join dataframes finished
dataframe filtered
dataframe filtered
dataframe filtered

Issue description

First of all, sorry for having such a complicated example. The issue is that problem only shows up on a large dataset and with a reasonably complicated calculation, so I couldn't further reduce the code size.

The snippet above produces non-deterministic, sometimes wrong results.

Explanation of the output: The row with experiment_id=0 and node_id=550 is not in base_features, but it's in complications with change='set_constant' and value=1. Therefore the feature_1_base should be 0, and feature_2_base should be 1 after the outer join and the fill_null operations.

The small test case shows that this is indeed the case. However, on the big data, it produces different results for different runs. The one that I've shared has a feature_1_base value of 1, which is clearly wrong.

Expected behavior

feature_1_base is deterministically 0 for the selected row.

Installed versions

``` --------Version info--------- Polars: 0.19.3 Index type: UInt32 Platform: macOS-13.5.2-arm64-arm-64bit Python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] ----Optional dependencies---- adbc_driver_sqlite: cloudpickle: connectorx: deltalake: fsspec: gevent: matplotlib: 3.7.1 numpy: 1.22.4 pandas: 2.0.3 pyarrow: 12.0.1 pydantic: sqlalchemy: 2.0.18 xlsx2csv: xlsxwriter: ```
ritchie46 commented 12 months ago

Can you show you log output? Especially on this huge file case this will help us a lot.

korommatyi commented 12 months ago

Sorry, it took like 40 runs this time to repro this, but finally managed. Updated the description. By the way, I didn't see any difference in the logs between the successful and failing runs.

avimallu commented 12 months ago

I guess one quick thing for you to check while the devs get to this is to verify that the parquet file itself is read correctly all the time. Perhaps a comparison with pyarrow based reader?

korommatyi commented 12 months ago

Tested with

import pyarrow.parquet as pq
import polars as pl
from polars import testing

for i in range(20):
    print(i)
    arr = pl.from_arrow(pq.read_table('./base_features.parquet'))
    norm = pl.read_parquet('./base_features.parquet')
    testing.assert_frame_equal(arr, norm)

    arr = pl.from_arrow(pq.read_table('./complications.parquet'))
    norm = pl.read_parquet('./complications.parquet')
    testing.assert_frame_equal(arr, norm)

Is this what you hand in mind? This didn't raise any errors.

I don't think it's a parquet issue. When I compute the same thing in smallish batches (taking slices of the dataframe and then concatenating the result), the result is deterministic and correct.

avimallu commented 12 months ago

Yes, I wanted to eliminate the possibility that Polars was not reading large datasets accurately.

ritchie46 commented 12 months ago

@korommatyi how many rows do you expect after the outer join? Outer joins produce very large results, maybe we overflow. Can you try polars-u64-idx?

korommatyi commented 12 months ago

I expect 300_587_478 entries if we ignore null values, or 300_822_742 if we don't ignore on either side. This should fit in a u32, but I will repro the problem with polars-u64-idx.

ritchie46 commented 12 months ago

Could someone who has got some time compile polars with debug assertions and see if we get an overflow somewhere?

korommatyi commented 12 months ago

I can't install polars-u64-idx. :(

Backend subprocess exited when trying to invoke build_wheel

  Running `maturin pep517 build-wheel -i /var/folders/vq/bb88l10d21sb6lk70njxxkjr0000gn/T/tmp1aea6wnf/.venv/bin/python --compatibility off`
      Updating crates.io index
  🍹 Building a mixed python/rust project
  🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
  🐍 Not using a specific python interpreter
  💻 Using `MACOSX_DEPLOYMENT_TARGET=11.0` for aarch64-apple-darwin by default
     Compiling libc v0.2.147
     Compiling autocfg v1.1.0
     Compiling proc-macro2 v1.0.66
     Compiling unicode-ident v1.0.11
     Compiling cfg-if v1.0.0
     Compiling version_check v0.9.4
     Compiling pkg-config v0.3.27
     Compiling scopeguard v1.2.0
     Compiling serde v1.0.188
     Compiling libm v0.2.7
     Compiling crossbeam-utils v0.8.16
     Compiling syn v1.0.109
     Compiling once_cell v1.18.0
     Compiling futures-core v0.3.28
     Compiling semver v1.0.18
     Compiling ahash v0.8.3
     Compiling num-traits v0.2.16
     Compiling memoffset v0.9.0
     Compiling crossbeam-epoch v0.9.15
     Compiling static_assertions v1.1.0
     Compiling rayon-core v1.11.0
     Compiling jobserver v0.1.26
     Compiling getrandom v0.2.10
     Compiling quote v1.0.33
     Compiling num_cpus v1.16.0
     Compiling crossbeam-channel v0.5.8
     Compiling cc v1.0.83
     Compiling siphasher v0.3.11
     Compiling memchr v2.6.1
     Compiling syn v2.0.29
     Compiling phf_shared v0.11.2
     Compiling lexical-util v0.8.5
     Compiling slab v0.4.9
     Compiling either v1.9.0
     Compiling futures-task v0.3.28
     Compiling regex-syntax v0.7.5
     Compiling crossbeam-deque v0.8.3
     Compiling futures-channel v0.3.28
     Compiling rand_core v0.6.4
     Compiling rand v0.8.5
     Compiling pin-project-lite v0.2.13
     Compiling futures-util v0.3.28
     Compiling allocator-api2 v0.2.16
     Compiling target-features v0.1.4
     Compiling futures-sink v0.3.28
     Compiling crc32fast v1.3.2
     Compiling rayon v1.7.0
     Compiling phf_generator v0.11.2
     Compiling cmake v0.1.50
     Compiling pin-utils v0.1.0
     Compiling itoa v1.0.9
     Compiling serde_json v1.0.105
     Compiling equivalent v1.0.1
     Compiling futures-io v0.3.28
     Compiling ryu v1.0.15
     Compiling core-foundation-sys v0.8.4
     Compiling zstd-sys v2.0.8+zstd.1.5.5
     Compiling lz4-sys v1.9.4
     Compiling regex-automata v0.3.7
     Compiling phf_codegen v0.11.2
     Compiling lexical-write-integer v0.8.5
     Compiling libz-ng-sys v1.1.12
     Compiling lexical-parse-integer v0.8.6
     Compiling phf v0.11.2
     Compiling zstd-safe v6.0.6
     Compiling alloc-no-stdlib v2.0.4
     Compiling async-trait v0.1.73
     Compiling snap v1.1.0
     Compiling alloc-stdlib v0.2.2
     Compiling lexical-parse-float v0.8.5
     Compiling lexical-write-float v0.8.5
     Compiling iana-time-zone v0.1.57
     Compiling fallible-streaming-iterator v0.1.9
     Compiling rle-decode-fast v1.0.3
     Compiling adler v1.0.2
     Compiling libflate_lz77 v1.2.0
     Compiling lexical-core v0.8.5
     Compiling brotli-decompressor v2.3.4
     Compiling miniz_oxide v0.7.1
     Compiling aho-corasick v1.0.5
     Compiling array-init-cursor v0.2.0
     Compiling crc-catalog v1.1.1
     Compiling regex v1.9.4
     Compiling adler32 v1.2.0
     Compiling libflate v1.4.0
     Compiling crc v2.1.0
     Compiling parse-zoneinfo v0.3.0
     Compiling hashbrown v0.14.0
  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:15:5
     |
  15 | /     feature(
  16 | |         test,
  17 | |         core_intrinsics,
  18 | |         dropck_eyepatch,
  ...  |
  24 | |         strict_provenance
  25 | |     )
     | |_____^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:16:9
     |
  16 |         test,
     |         ^^^^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:17:9
     |
  17 |         core_intrinsics,
     |         ^^^^^^^^^^^^^^^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:20:9
     |
  20 |         extend_one,
     |         ^^^^^^^^^^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:21:9
     |
  21 |         allocator_api,
     |         ^^^^^^^^^^^^^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:22:9
     |
  22 |         slice_ptr_get,
     |         ^^^^^^^^^^^^^

  error[E0554]: `#![feature]` may not be used on the stable release channel
    --> /Users/matyaskorom/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hashbrown-0.14.0/src/lib.rs:23:9
     |
  23 |         maybe_uninit_array_assume_init,
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

     Compiling chrono-tz-build v0.2.0
  For more information about this error, try `rustc --explain E0554`.
  error: could not compile `hashbrown` (lib) due to 7 previous errors
  warning: build failed, waiting for other jobs to finish...
  💥 maturin failed
    Caused by: Failed to build a native library through cargo
    Caused by: Cargo build finished with "exit status: 101": `MACOSX_DEPLOYMENT_TARGET="11.0" PYO3_ENVIRONMENT_SIGNATURE="cpython-3.10-64bit" PYO3_PYTHON="/var/folders/vq/bb88l10d21sb6lk70njxxkjr0000gn/T/tmp1aea6wnf/.venv/bin/python" PYTHON_SYS_EXECUTABLE="/var/folders/vq/bb88l10d21sb6lk70njxxkjr0000gn/T/tmp1aea6wnf/.venv/bin/python" "cargo" "rustc" "--message-format" "json-render-diagnostics" "--manifest-path" "/private/var/folders/vq/bb88l10d21sb6lk70njxxkjr0000gn/T/tmpw5z78xwz/polars_u64_idx-0.19.3/Cargo.toml" "--release" "--lib" "--" "-C" "link-arg=-undefined" "-C" "link-arg=dynamic_lookup" "-C" "link-args=-Wl,-install_name,@rpath/polars.abi3.so"`
  Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/var/folders/vq/bb88l10d21sb6lk70njxxkjr0000gn/T/tmp1aea6wnf/.venv/bin/python', '--compatibility', 'off'] returned non-zero exit status 1
korommatyi commented 11 months ago

I've tried to figure out whether it's a multiprocessing issue or a datasize issue by setting POLARS_MAX_THREADS to 1 but I still got inconsistent results.

korommatyi commented 11 months ago

@avimallu, @ritchie46 do you know if somebody is actively working on this issue? We need to decide soon whether we put the polars version to prod or we need to rewrite it in pandas. For this decision, it would be helpful for us if we knew whether we can expect a fix soon or not. Thanks!

avimallu commented 11 months ago

@korommatyi, I'm not a part of the Polars dev team; I just help out. 😅 That said, there are labels that the team maintains on issues that specify whether or not it was accepted as a bug. I think the bug needs to be reproduced in a build like Ritchie mentioned earlier:

Could someone who has got some time compile polars with debug assertions and see if we get an overflow somewhere?

That will probably help in speeding things up.

korommatyi commented 11 months ago

I tried to test with the u64 build on Linux, but turns out I can't reproduce the issue on Linux with the normal, u32 build either. I made 220 attempts and never seen the wrong answer. Seems like this issue is Mac (ARM Mac?) specific.

korommatyi commented 10 months ago

I tried to run this with debug assertions but couldn't even run make tests from py-polars following the contribution guide. It fails with the following error:

================================================== test session starts ==================================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.3.0
rootdir: /Users/matyaskorom/personal/polars/py-polars
configfile: pyproject.toml
plugins: hypothesis-6.87.1, cov-4.1.0, xdist-3.3.1
12 workers [3853 items]
................................................................................................................. [  2%]
...................................................................................................Fatal Python error: Aborted

Thread 0x000000016e21b000 (most recent call first):
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 474 in read
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 507 in from_io
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1049 in _thread_receiver
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn

Current thread 0x00000001dac21300 (most recent call first):
  File "/Users/matyaskorom/personal/polars/py-polars/polars/lazyframe/frame.py", line 1787 in collect
  File "/Users/matyaskorom/personal/polars/py-polars/polars/utils/deprecation.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/polars/dataframe/frame.py", line 7766 in select
  File "/Users/matyaskorom/personal/polars/py-polars/polars/series/utils.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/tests/unit/test_errors.py", line 110 in test_panic_error
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 174 in run_one_test
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 157 in pytest_runtestloop
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 355 in <module>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1157 in executetask
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 343 in integrate_as_primary_thread
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1142 in serve
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1640 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>
.
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging., matplotlib._path, kiwisolver._cext, greenlet._greenlet, zope.interface._zope_interface_coptimizations, gevent.libev.corecext, gevent._gevent_c_greenlet_primitives, gevent._gevent_c_hub_local, gevent._gevent_c_waiter, gevent._gevent_c_hub_primitives, gevent._gevent_c_ident, gevent._gevent_cgreenlet, pyarrow.lib., pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion., pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops., pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyarrow._acero, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._dataset_parquet, yaml._yaml, charset_normalizer.md, markupsafe._speedups (total: 87)
....[gw5] node down: Not properly terminated
F
replacing crashed worker gw5
collecting: 12/13 workers..................................................................................................................................................................................................................................................................................................................................................collecting: 12/13 workers..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................s.....................................................................................................................................................................................................................................................................................................................................................Fatal Python error: Aborted

Thread 0x000000016e8bf000 (most recent call first):
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 474 in read
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 507 in from_io
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1049 in _thread_receiver
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn

Current thread 0x00000001dac21300 (most recent call first):
  File "/Users/matyaskorom/personal/polars/py-polars/polars/functions/range/date_range.py", line 219 in date_range
  File "/Users/matyaskorom/personal/polars/py-polars/polars/utils/deprecation.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/polars/utils/deprecation.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/tests/unit/functions/range/test_date_range.py", line 24 in test_date_range_invalid_time_unit
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 174 in run_one_test
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 157 in pytest_runtestloop
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 355 in <module>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1157 in executetask
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 343 in integrate_as_primary_thread
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1142 in serve
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1640 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, greenlet._greenlet, zope.interface._zope_interface_coptimizations, gevent.libev.corecext, gevent._gevent_c_greenlet_primitives, gevent._gevent_c_hub_local, gevent._gevent_c_waiter, gevent._gevent_c_hub_primitives, gevent._gevent_c_ident, gevent._gevent_cgreenlet, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyarrow._acero, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._dataset_parquet, yaml._yaml, charset_normalizer.md, markupsafe._speedups (total: 87)
...........Fatal Python error: Aborted

Thread 0x000000017034f000 (most recent call first):
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 474 in read
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 507 in from_io
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1049 in _thread_receiver
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn

Current thread 0x00000001dac21300 (most recent call first):
  File "/Users/matyaskorom/personal/polars/py-polars/polars/functions/range/datetime_range.py", line 189 in datetime_range
  File "/Users/matyaskorom/personal/polars/py-polars/tests/unit/functions/range/test_datetime_range.py", line 99 in test_datetime_range_invalid_time_unit
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 174 in run_one_test
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 157 in pytest_runtestloop
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 355 in <module>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1157 in executetask
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 343 in integrate_as_primary_thread
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1142 in serve
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1640 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, greenlet._greenlet, zope.interface._zope_interface_coptimizations, gevent.libev.corecext, gevent._gevent_c_greenlet_primitives, gevent._gevent_c_hub_local, gevent._gevent_c_waiter, gevent._gevent_c_hub_primitives, gevent._gevent_c_ident, gevent._gevent_cgreenlet, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyarrow._acero, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._dataset_parquet, yaml._yaml, charset_normalizer.md, markupsafe._speedups (total: 87)
............[gw7] node down: Not properly terminated

replacing crashed worker gw7
collecting: 12/14 workers..[gw10] node down: Not properly terminated

replacing crashed worker gw10
collecting: 12/15 workers................................................................................................collecting: 12/15 workers...........................................................................................................x..x.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................s.....s.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................x.........collecting: 13/15 workersINTERNALERROR> Traceback (most recent call last):..........................
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 270, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>                          ^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 324, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 152, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_result.py", line 114, in get_result
INTERNALERROR>     raise exc.with_traceback(exc.__traceback__)
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>           ^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 122, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 145, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/dsession.py", line 270, in worker_collectionfinish
INTERNALERROR>     self.sched.schedule()
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 339, in schedule
INTERNALERROR>     self._reschedule(node)
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 321, in _reschedule
INTERNALERROR>     self._assign_work_unit(node)
INTERNALERROR>   File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 259, in _assign_work_unit
INTERNALERROR>     worker_collection = self.registered_collections[node]
INTERNALERROR>                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
INTERNALERROR> KeyError: <WorkerController gw13>
Fatal Python error: Aborted

Thread 0x000000016ff63000 (most recent call first):
  File "/Users/matyaskorom/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 324 in wait
  File "/Users/matyaskorom/.pyenv/versions/3.11.4/lib/python3.11/threading.py", line 622 in wait
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 409 in waitall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1114 in _terminate_execution
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1065 in _thread_receiver
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn

Current thread 0x00000001dac21300 (most recent call first):
  File "/Users/matyaskorom/personal/polars/py-polars/polars/lazyframe/frame.py", line 1787 in collect
  File "/Users/matyaskorom/personal/polars/py-polars/polars/utils/deprecation.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/polars/dataframe/frame.py", line 7766 in select
  File "/Users/matyaskorom/personal/polars/py-polars/polars/series/utils.py", line 100 in wrapper
  File "/Users/matyaskorom/personal/polars/py-polars/tests/unit/test_errors.py", line 110 in test_panic_error
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1788 in runtest
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 174 in run_one_test
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 157 in pytest_runtestloop
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 324 in _main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 270 in wrap_session
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/xdist/remote.py", line 355 in <module>
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1157 in executetask
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 296 in run
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 361 in _perform_spawn
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 343 in integrate_as_primary_thread
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1142 in serve
  File "/Users/matyaskorom/personal/polars/.venv/lib/python3.11/site-packages/execnet/gateway_base.py", line 1640 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, greenlet._greenlet, zope.interface._zope_interface_coptimizations, gevent.libev.corecext, gevent._gevent_c_greenlet_primitives, gevent._gevent_c_hub_local, gevent._gevent_c_waiter, gevent._gevent_c_hub_primitives, gevent._gevent_c_ident, gevent._gevent_cgreenlet, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pyarrow._compute, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, pyarrow._fs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, pyarrow._acero, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._dataset_parquet, yaml._yaml, charset_normalizer.md, markupsafe._speedups (total: 87)

================================= 1 failed, 3828 passed, 4 skipped, 3 xfailed in 15.93s =================================
make: *** [test] Error 3

Seems like tests/unit/test_errors.py", line 110 in test_panic_error is the one that fails.

Does anybody have any ideas?

trueb2 commented 9 months ago

This happens to me as well on mac; however, CI passes. There are 3 tests that expect panic, but instead crash the test runner.

Disabling these tests allows successful make test

test_panic_error test_date_range_invalid_time_unit test_datetime_range_invalid_time_unit