pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.26k stars 1.96k forks source link

Test suite fails with `pyo3_runtime.PanicException: misaligned pointer dereference` #7952

Closed jakob-keller closed 1 year ago

jakob-keller commented 1 year ago

Polars version checks

Issue description

ef2ab8af98eb154dc2f924105d787ad0328a5944 introduced changes that cause my local development environment to fail the test suite with 4 errors.

I believe I set up the environment as described in CONTRIBUTING.md. It looks like this:

macOS Ventura 13.3 on Apple M2 Pro

CPython 3.10.10

rustup --version
rustup 1.25.2 (17db695f1 2023-02-01)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.70.0-nightly (0599b6b93 2023-04-01)`

Reproducible example

Output of `make test` ``` py-polars % make test 🍹 Building a mixed python/rust project πŸ”— Found pyo3 bindings with abi3 support for Python β‰₯ 3.7 🐍 Not using a specific python interpreter Ignoring backports.zoneinfo: markers 'python_version < "3.9" and extra == "timezone"' don't match your environment Ignoring tzdata: markers 'platform_system == "Windows" and extra == "timezone"' don't match your environment Ignoring connectorx: markers 'extra == "connectorx"' don't match your environment Ignoring fsspec: markers 'extra == "fsspec"' don't match your environment Ignoring numpy: markers 'extra == "numpy"' don't match your environment Ignoring xlsx2csv: markers 'extra == "xlsx2csv"' don't match your environment Ignoring xlsxwriter: markers 'extra == "xlsxwriter"' don't match your environment Ignoring pyarrow: markers 'extra == "pandas"' don't match your environment Ignoring pandas: markers 'extra == "pandas"' don't match your environment Ignoring deltalake: markers 'extra == "deltalake"' don't match your environment Ignoring sqlalchemy: markers 'extra == "sqlalchemy"' don't match your environment Ignoring pandas: markers 'extra == "sqlalchemy"' don't match your environment Ignoring polars: markers 'extra == "all"' don't match your environment Ignoring pyarrow: markers 'extra == "pyarrow"' don't match your environment Ignoring matplotlib: markers 'extra == "matplotlib"' don't match your environment Requirement already satisfied: typing_extensions>=4.0.1 in ./.venv/lib/python3.10/site-packages (4.5.0) πŸ’» Using `MACOSX_DEPLOYMENT_TARGET=11.0` for aarch64-apple-darwin by default Finished dev [unoptimized + debuginfo] target(s) in 0.40s πŸ“¦ Built wheel for abi3 Python β‰₯ 3.7 to /var/folders/7b/kqrfbrqj563g93dc65hn31kc0000gn/T/.tmpu1sXFK/polars-0.16.17-cp37-abi3-macosx_11_0_arm64.whl πŸ›  Installed polars-0.16.17 .venv/bin/pytest -n auto --dist worksteal =============================================================================================================================================== test session starts =============================================================================================================================================== platform darwin -- Python 3.10.10, pytest-7.2.0, pluggy-1.0.0 rootdir: /Users/xxx/PycharmProjects/polars/py-polars, configfile: pyproject.toml plugins: hypothesis-6.70.1, xdist-3.2.0, cov-4.0.0 gw0 [2042] / gw1 [2042] / gw2 [2042] / gw3 [2042] / gw4 [2042] / gw5 [2042] / gw6 [2042] / gw7 [2042] / gw8 [2042] / gw9 [2042] ........................................................................................................................................................................................................................................................................................................... [ 14%] ...................................................................................................................F....................................................................................................................................................................................... [ 29%] ........................................................................................................................................................................................................................................................................................................... [ 43%] .............................F......................................................s...................................................................................................................................................................................................................... [ 58%] ............................................................................................................................................................................................................................................................................................................ [ 73%] ..................................................................................................................................................................F............................................................................................................F........................... [ 87%] ....................................................................................................................................................................................................................................................... [100%] ==================================================================================================================================================== FAILURES ===================================================================================================================================================== ______________________________________________________________________________________________________________________________________ test_init_dataclasses_and_namedtuple _______________________________________________________________________________________________________________________________________ [gw0] darwin -- Python 3.10.10 /Users/xxx/PycharmProjects/polars/py-polars/.venv/bin/python monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x132503d00> def test_init_dataclasses_and_namedtuple(monkeypatch: Any) -> None: from dataclasses import dataclass from typing import NamedTuple monkeypatch.setenv("POLARS_ACTIVATE_DECIMAL", "1") from polars.utils._construction import dataclass_type_hints @dataclass class TradeDC: timestamp: datetime ticker: str price: Decimal size: int | None = None class TradeNT(NamedTuple): timestamp: datetime ticker: str price: Decimal size: int | None = None raw_data = [ (datetime(2022, 9, 8, 14, 30, 45), "AAPL", Decimal("157.5"), 125), (datetime(2022, 9, 9, 10, 15, 12), "FLSY", Decimal("10.0"), 1500), (datetime(2022, 9, 7, 15, 30), "MU", Decimal("55.5"), 400), ] for TradeClass in (TradeDC, TradeNT): trades = [TradeClass(*values) for values in raw_data] for DF in (pl.DataFrame, pl.from_records): df = DF(data=trades) # type: ignore[operator] assert df.schema == { "timestamp": pl.Datetime("us"), "ticker": pl.Utf8, "price": pl.Decimal(None, 1), "size": pl.Int64, } > assert df.rows() == raw_data tests/unit/test_constructors.py:154: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ polars/utils/decorators.py:136: in wrapper return function(*args, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = shape: (3, 4) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ timestamp ┆ ticker ┆ price ┆ size ... ┆ 1500 β”‚ β”‚ 2022-09-07 15:30:00 ┆ MU ┆ 55.5 ┆ 400 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜, named = False @deprecate_nonkeyword_arguments() def rows(self, named: bool = False) -> list[tuple[Any, ...]] | list[dict[str, Any]]: """ Returns all data in the DataFrame as a list of rows of python-native values. Parameters ---------- named Return dictionaries instead of tuples. The dictionaries are a mapping of column name to row value. This is more expensive than returning a regular tuple, but allows for accessing values by column name. Notes ----- If you have ``ns``-precision temporal values you should be aware that python natively only supports up to ``us``-precision; if this matters you should export to a different format. Warnings -------- Row-iteration is not optimal as the underlying data is stored in columnar form; where possible, prefer export via one of the dedicated export/output methods. Returns ------- A list of tuples (default) or dictionaries of row values. Examples -------- >>> df = pl.DataFrame( ... { ... "a": [1, 3, 5], ... "b": [2, 4, 6], ... } ... ) >>> df.rows() [(1, 2), (3, 4), (5, 6)] >>> df.rows(named=True) [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}] See Also -------- iter_rows : Row iterator over frame data (does not materialise all rows). """ if named: # Load these into the local namespace for a minor performance boost dict_, zip_, columns = dict, zip, self.columns return [dict_(zip_(columns, row)) for row in self._df.row_tuples()] else: > return self._df.row_tuples() E pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38 polars/dataframe/frame.py:7839: PanicException ---------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------------------------- thread '' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38', src/conversion.rs:191:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace _________________________________________________________________________________________________________________________________________________ test_from_arrow _________________________________________________________________________________________________________________________________________________ [gw0] darwin -- Python 3.10.10 /Users/xxx/PycharmProjects/polars/py-polars/.venv/bin/python monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x13252d150> def test_from_arrow(monkeypatch: Any) -> None: monkeypatch.setenv("POLARS_ACTIVATE_DECIMAL", "1") tbl = pa.table( { "a": pa.array([1, 2], pa.timestamp("s")), "b": pa.array([1, 2], pa.timestamp("ms")), "c": pa.array([1, 2], pa.timestamp("us")), "d": pa.array([1, 2], pa.timestamp("ns")), "e": pa.array([1, 2], pa.int32()), "decimal1": pa.array([1, 2], pa.decimal128(2, 1)), } ) expected_schema = { "a": pl.Datetime("ms"), "b": pl.Datetime("ms"), "c": pl.Datetime("us"), "d": pl.Datetime("ns"), "e": pl.Int32, "decimal1": pl.Decimal(2, 1), } expected_data = [ ( datetime(1970, 1, 1, 0, 0, 1), datetime(1970, 1, 1, 0, 0, 0, 1000), datetime(1970, 1, 1, 0, 0, 0, 1), datetime(1970, 1, 1, 0, 0), 1, Decimal("1.0"), ), ( datetime(1970, 1, 1, 0, 0, 2), datetime(1970, 1, 1, 0, 0, 0, 2000), datetime(1970, 1, 1, 0, 0, 0, 2), datetime(1970, 1, 1, 0, 0), 2, Decimal("2.0"), ), ] df = cast(pl.DataFrame, pl.from_arrow(tbl)) assert df.schema == expected_schema > assert df.rows() == expected_data tests/unit/test_df.py:242: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ polars/utils/decorators.py:136: in wrapper return function(*args, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = shape: (2, 6) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ a ...02 ┆ ┆ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜, named = False @deprecate_nonkeyword_arguments() def rows(self, named: bool = False) -> list[tuple[Any, ...]] | list[dict[str, Any]]: """ Returns all data in the DataFrame as a list of rows of python-native values. Parameters ---------- named Return dictionaries instead of tuples. The dictionaries are a mapping of column name to row value. This is more expensive than returning a regular tuple, but allows for accessing values by column name. Notes ----- If you have ``ns``-precision temporal values you should be aware that python natively only supports up to ``us``-precision; if this matters you should export to a different format. Warnings -------- Row-iteration is not optimal as the underlying data is stored in columnar form; where possible, prefer export via one of the dedicated export/output methods. Returns ------- A list of tuples (default) or dictionaries of row values. Examples -------- >>> df = pl.DataFrame( ... { ... "a": [1, 3, 5], ... "b": [2, 4, 6], ... } ... ) >>> df.rows() [(1, 2), (3, 4), (5, 6)] >>> df.rows(named=True) [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}] See Also -------- iter_rows : Row iterator over frame data (does not materialise all rows). """ if named: # Load these into the local namespace for a minor performance boost dict_, zip_, columns = dict, zip, self.columns return [dict_(zip_(columns, row)) for row in self._df.row_tuples()] else: > return self._df.row_tuples() E pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38 polars/dataframe/frame.py:7839: PanicException ---------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------------------------- thread '' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38', src/conversion.rs:191:9 _______________________________________________________________________________________________________________________________________ test_series_from_pydecimal_and_ints _______________________________________________________________________________________________________________________________________ [gw4] darwin -- Python 3.10.10 /Users/xxx/PycharmProjects/polars/py-polars/.venv/bin/python def test_series_from_pydecimal_and_ints() -> None: # TODO: check what happens if there are strings, floats arrow scalars in the list for data in permutations_int_dec_none(): s = pl.Series("name", data) assert s.dtype == pl.Decimal(None, 7) # inferred scale = 7, precision = None assert s.name == "name" assert s.null_count() == 1 for i, d in enumerate(data): > assert s[i] == d tests/unit/datatypes/test_decimal.py:33: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = shape: (5,) Series: 'name' [decimal[7]] [ -0.01 1.2345678 500 -1 null ], item = 0 def __getitem__( self, item: ( int | Series | range | slice | np.ndarray[Any, Any] | list[int] | list[bool] ), ) -> Any: if isinstance(item, Series) and item.dtype in { UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, }: # Unsigned or signed Series (ordered from fastest to slowest). # - pl.UInt32 (polars) or pl.UInt64 (polars_u64_idx) Series indexes. # - Other unsigned Series indexes are converted to pl.UInt32 (polars) # or pl.UInt64 (polars_u64_idx). # - Signed Series indexes are converted pl.UInt32 (polars) or # pl.UInt64 (polars_u64_idx) after negative indexes are converted # to absolute indexes. return self._from_pyseries( self._s.take_with_series(self._pos_idxs(item)._s) ) elif ( _check_for_numpy(item) and isinstance(item, np.ndarray) and item.dtype.kind in ("i", "u") ): if item.ndim != 1: raise ValueError("Only a 1D-Numpy array is supported as index.") # Unsigned or signed Numpy array (ordered from fastest to slowest). # - np.uint32 (polars) or np.uint64 (polars_u64_idx) numpy array # indexes. # - Other unsigned numpy array indexes are converted to pl.UInt32 # (polars) or pl.UInt64 (polars_u64_idx). # - Signed numpy array indexes are converted pl.UInt32 (polars) or # pl.UInt64 (polars_u64_idx) after negative indexes are converted # to absolute indexes. return self._from_pyseries( self._s.take_with_series(self._pos_idxs(item)._s) ) # Integer. elif isinstance(item, int): if item < 0: item = self.len() + item > return self._s.get_idx(item) E pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b429428 polars/series/series.py:858: PanicException ---------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------------------------- thread '' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b429428', src/conversion.rs:191:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace _______________________________________________________________________________________________________________________________________ test_frame_from_pydecimal_and_ints ________________________________________________________________________________________________________________________________________ [gw4] darwin -- Python 3.10.10 /Users/xxx/PycharmProjects/polars/py-polars/.venv/bin/python monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x137ab9b40> def test_frame_from_pydecimal_and_ints(monkeypatch: Any) -> None: monkeypatch.setenv("POLARS_ACTIVATE_DECIMAL", "1") class X(NamedTuple): a: int | D | None @dataclass class Y: a: int | D | None for data in permutations_int_dec_none(): row_data = [(d,) for d in data] for cls in (X, Y): for ctor in (pl.DataFrame, pl.from_records): df = ctor(data=list(map(cls, data))) # type: ignore[operator] assert df.schema == { "a": pl.Decimal(None, 7), } > assert df.rows() == row_data tests/unit/datatypes/test_decimal.py:54: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ polars/utils/decorators.py:136: in wrapper return function(*args, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = shape: (5, 1) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ a β”‚ β”‚ --- β”‚ β”‚ decimal[7] β”‚ β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•‘ β”‚ -0.01 β”‚ β”‚ 1.2345678 β”‚ β”‚ 500 β”‚ β”‚ -1 β”‚ β”‚ null β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜, named = False @deprecate_nonkeyword_arguments() def rows(self, named: bool = False) -> list[tuple[Any, ...]] | list[dict[str, Any]]: """ Returns all data in the DataFrame as a list of rows of python-native values. Parameters ---------- named Return dictionaries instead of tuples. The dictionaries are a mapping of column name to row value. This is more expensive than returning a regular tuple, but allows for accessing values by column name. Notes ----- If you have ``ns``-precision temporal values you should be aware that python natively only supports up to ``us``-precision; if this matters you should export to a different format. Warnings -------- Row-iteration is not optimal as the underlying data is stored in columnar form; where possible, prefer export via one of the dedicated export/output methods. Returns ------- A list of tuples (default) or dictionaries of row values. Examples -------- >>> df = pl.DataFrame( ... { ... "a": [1, 3, 5], ... "b": [2, 4, 6], ... } ... ) >>> df.rows() [(1, 2), (3, 4), (5, 6)] >>> df.rows(named=True) [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}] See Also -------- iter_rows : Row iterator over frame data (does not materialise all rows). """ if named: # Load these into the local namespace for a minor performance boost dict_, zip_, columns = dict, zip, self.columns return [dict_(zip_(columns, row)) for row in self._df.row_tuples()] else: > return self._df.row_tuples() E pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b428d38 polars/dataframe/frame.py:7839: PanicException ---------------------------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------------------------- thread '' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b428d38', src/conversion.rs:191:9 ============================================================================================================================================= short test summary info ============================================================================================================================================= FAILED tests/unit/test_constructors.py::test_init_dataclasses_and_namedtuple - pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38 FAILED tests/unit/test_df.py::test_from_arrow - pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16cf18d38 FAILED tests/unit/datatypes/test_decimal.py::test_series_from_pydecimal_and_ints - pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b429428 FAILED tests/unit/datatypes/test_decimal.py::test_frame_from_pydecimal_and_ints - pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16b428d38 ==================================================================================================================================== 4 failed, 2037 passed, 1 skipped in 9.14s ==================================================================================================================================== make: *** [test] Error 1 ```

Expected behavior

make test succeeds

Installed versions

Replace this line with a list of feature gates
stinodego commented 1 year ago

I can reproduce this on my M1 Mac. I might have some time tomorrow to look into this. We could xfail those tests for now for Mac.

We really need M1 GitHub runners... apparently, that's on their roadmap.

ritchie46 commented 1 year ago

In the meantime I will see if I can fix the culprits.

jakob-keller commented 1 year ago

In the meantime I will see if I can fix the culprits.

Let me know, if you need any additional context or want me to test something.

ritchie46 commented 1 year ago

@jakob-keller could you do a run with RUST_BACKTRACE=1 and post the backtrace here?

stinodego commented 1 year ago

Here's the full thing for one of the tests:

(.venv) stijn@Hephaestos:~/Documents/code/polars/py-polars$ pytest -k test_series_from_pydecimal_and_ints
========================================================================= test session starts =========================================================================
platform darwin -- Python 3.11.0, pytest-7.2.0, pluggy-1.0.0
rootdir: /Users/stijn/Documents/code/polars/py-polars, configfile: pyproject.toml
plugins: hypothesis-6.70.1, xdist-3.2.0, cov-4.0.0
collected 2133 items / 2132 deselected / 1 selected                                                                                                                   

tests/unit/datatypes/test_decimal.py F                                                                                                                          [100%]

============================================================================== FAILURES ===============================================================================
_________________________________________________________________ test_series_from_pydecimal_and_ints _________________________________________________________________

    def test_series_from_pydecimal_and_ints() -> None:
        # TODO: check what happens if there are strings, floats arrow scalars in the list
        for data in permutations_int_dec_none():
            s = pl.Series("name", data)
            assert s.dtype == pl.Decimal(None, 7)  # inferred scale = 7, precision = None
            assert s.name == "name"
            assert s.null_count() == 1
            for i, d in enumerate(data):
>               assert s[i] == d

tests/unit/datatypes/test_decimal.py:33: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = shape: (5,)
Series: 'name' [decimal[7]]
[
        -0.01
        1.2345678
        500
        -1
        null
], item = 0

    def __getitem__(
        self,
        item: (
            int | Series | range | slice | np.ndarray[Any, Any] | list[int] | list[bool]
        ),
    ) -> Any:
        if isinstance(item, Series) and item.dtype in {
            UInt8,
            UInt16,
            UInt32,
            UInt64,
            Int8,
            Int16,
            Int32,
            Int64,
        }:
            # Unsigned or signed Series (ordered from fastest to slowest).
            #   - pl.UInt32 (polars) or pl.UInt64 (polars_u64_idx) Series indexes.
            #   - Other unsigned Series indexes are converted to pl.UInt32 (polars)
            #     or pl.UInt64 (polars_u64_idx).
            #   - Signed Series indexes are converted pl.UInt32 (polars) or
            #     pl.UInt64 (polars_u64_idx) after negative indexes are converted
            #     to absolute indexes.
            return self._from_pyseries(
                self._s.take_with_series(self._pos_idxs(item)._s)
            )

        elif (
            _check_for_numpy(item)
            and isinstance(item, np.ndarray)
            and item.dtype.kind in ("i", "u")
        ):
            if item.ndim != 1:
                raise ValueError("Only a 1D-Numpy array is supported as index.")

            # Unsigned or signed Numpy array (ordered from fastest to slowest).
            #   - np.uint32 (polars) or np.uint64 (polars_u64_idx) numpy array
            #     indexes.
            #   - Other unsigned numpy array indexes are converted to pl.UInt32
            #     (polars) or pl.UInt64 (polars_u64_idx).
            #   - Signed numpy array indexes are converted pl.UInt32 (polars) or
            #     pl.UInt64 (polars_u64_idx) after negative indexes are converted
            #     to absolute indexes.
            return self._from_pyseries(
                self._s.take_with_series(self._pos_idxs(item)._s)
            )

        # Integer.
        elif isinstance(item, int):
            if item < 0:
                item = self.len() + item
>           return self._s.get_idx(item)
E           pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16ce57b38

polars/series/series.py:850: PanicException
------------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------------
thread '<unnamed>' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16ce57b38', src/conversion.rs:191:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/std/src/panicking.rs:577:5
   1: core::panicking::panic_fmt
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_misaligned_pointer_dereference
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/core/src/panicking.rs:174:5
   3: polars::conversion::decimal_to_digits
             at ./src/conversion.rs:191:9
   4: <polars::conversion::Wrap<polars_core::datatypes::any_value::AnyValue> as pyo3::conversion::IntoPy<pyo3::instance::Py<pyo3::types::any::PyAny>>>::into_py
             at ./src/conversion.rs:266:32
   5: polars::series::PySeries::get_idx
             at ./src/series.rs:430:16
   6: polars::series::_::<impl polars::series::PySeries>::__pymethod_get_idx__
             at ./src/series.rs:221:1
   7: pyo3::impl_::trampoline::cfunction_with_keywords::{{closure}}
             at /Users/stijn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.18.2/src/impl_/trampoline.rs:41:35
   8: pyo3::impl_::trampoline::trampoline_inner::{{closure}}
             at /Users/stijn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.18.2/src/impl_/trampoline.rs:204:54
   9: std::panicking::try::do_call
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/std/src/panicking.rs:485:40
  10: ___rust_try
  11: std::panicking::try
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/std/src/panicking.rs:449:19
  12: std::panic::catch_unwind
             at /rustc/0599b6b931816ab46ab79072189075f543931cbd/library/std/src/panic.rs:140:14
  13: pyo3::impl_::trampoline::trampoline_inner
             at /Users/stijn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.18.2/src/impl_/trampoline.rs:204:9
  14: pyo3::impl_::trampoline::cfunction_with_keywords
             at /Users/stijn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.18.2/src/impl_/trampoline.rs:41:13
  15: polars::series::_::_::__init::__INVENTORY::trampoline
             at ./src/series.rs:221:1
  16: _method_vectorcall_VARARGS_KEYWORDS
  17: _PyObject_Vectorcall
  18: __PyEval_EvalFrameDefault
  19: __PyEval_Vector
  20: _slot_mp_subscript
  21: __PyEval_EvalFrameDefault
  22: __PyEval_Vector
  23: __PyEval_EvalFrameDefault
  24: __PyEval_Vector
  25: __PyEval_EvalFrameDefault
  26: __PyEval_Vector
  27: __PyObject_FastCallDictTstate
  28: __PyObject_Call_Prepend
  29: _slot_tp_call
  30: __PyObject_MakeTpCall
  31: __PyEval_EvalFrameDefault
  32: __PyEval_Vector
  33: __PyEval_EvalFrameDefault
  34: __PyEval_Vector
  35: __PyObject_FastCallDictTstate
  36: __PyObject_Call_Prepend
  37: _slot_tp_call
  38: __PyObject_Call
  39: __PyEval_EvalFrameDefault
  40: __PyEval_Vector
  41: __PyEval_EvalFrameDefault
  42: __PyEval_Vector
  43: __PyEval_EvalFrameDefault
  44: __PyEval_Vector
  45: __PyObject_FastCallDictTstate
  46: __PyObject_Call_Prepend
  47: _slot_tp_call
  48: __PyObject_MakeTpCall
  49: __PyEval_EvalFrameDefault
  50: __PyEval_Vector
  51: __PyEval_EvalFrameDefault
  52: __PyEval_Vector
  53: __PyObject_FastCallDictTstate
  54: __PyObject_Call_Prepend
  55: _slot_tp_call
  56: __PyObject_MakeTpCall
  57: __PyEval_EvalFrameDefault
  58: __PyEval_Vector
  59: __PyEval_EvalFrameDefault
  60: __PyEval_Vector
  61: __PyObject_FastCallDictTstate
  62: __PyObject_Call_Prepend
  63: _slot_tp_call
  64: __PyObject_MakeTpCall
  65: __PyEval_EvalFrameDefault
  66: _PyEval_EvalCode
  67: __PyRun_SimpleFileObject
  68: __PyRun_AnyFileObject
  69: _Py_RunMain
  70: _pymain_main
  71: _Py_BytesMain
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
======================================================================= short test summary info =======================================================================
FAILED tests/unit/datatypes/test_decimal.py::test_series_from_pydecimal_and_ints - pyo3_runtime.PanicException: misaligned pointer dereference: address must be a multiple of 0x10 but is 0x16ce57b38
ritchie46 commented 1 year ago

Thanks @stinodego. Can work with that. Is it only relsated to decimals?

stinodego commented 1 year ago

The four tests that fail are indeed all related to Decimals somehow.