man-group / ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
http://arcticdb.io
Other
1.51k stars 93 forks source link

Column order in resampled DataFrame is different than the order in the aggregation dict input #1996

Open vasil-pashov opened 1 week ago

vasil-pashov commented 1 week ago

Describe the bug

When Pandas performs resampling the columns are in the same order as the keys in the aggregation dictionary. ArcticDB reorders the columns.

Steps/Code to Reproduce

import pandas as pd
import numpy as np
import arcticdb as adb

COLUMN_DTYPE = ["float", "int", "uint"]
ALL_AGGREGATIONS = ["sum", "mean", "min", "max", "first", "last", "count"]

ac = adb.Arctic("lmdb://test")
lib = ac.get_library(create_if_missing=True)

index = pd.DatetimeIndex(pd.date_range(pd.Timestamp("2024-01-01"), pd.Timestamp("2024-01-02"), freq="1min"))
df = pd.DataFrame({"col_float": range(len(index))}, index=index)
lib.write("sym", df)
agg = {f"{name}_{op}": (name, op) for name in list(df.columns) for op in ALL_AGGREGATIONS}
q = QueryBuilder()
q = q.resample('1h').agg(agg)
print(agg.keys())
print(lib.read("sym", query_builder=q).data)

Expected Results

Produce the same output as Pandas

OS, Python Version and ArcticDB Version

Python: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] OS: Windows-10-10.0.26100-SP0 ArcticDB: dev

Backend storage used

No response