ngsankha / absynthe

synthesis guided by abstract interpretation
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

pandas columns domain #7

Closed ngsankha closed 2 years ago

ngsankha commented 2 years ago

The one in the code base is disabled right now. A proper pandas rows/columns domain needs to track the following information:

Implementing this correctly should give us most constant strings automatically which are now supplied upfront.

ngsankha commented 2 years ago

Constants are handled as of commit f9611a4a72877e34e3214cead2fdcf75ff7bbfd7.

The index of rows/columns are not yet enabled. It might help eliminate more programs by abstract interpretation.

ngsankha commented 2 years ago

Intermediate benchmarking results after enabling constants and running the tool on each benchmark with 20min timeout (same as AutoPandas paper). We do 17/29 benchmarks vs 17/26 on the original paper. I suspect the numbers to be better after the rows/columns domain.

SO_49581206_depth3
------------------
ERROR!

SO_13576164_depth3
------------------
ERROR!

SO_14023037_depth3
------------------
ERROR!

SO_23321300_depth3
------------------
ERROR!

SO_13807758_depth2
------------------
arg0.dropna().reset_index(drop=True)

48.4200041539998

SO_49567723_depth2
------------------
ERROR!

SO_11811392_depth3
------------------
arg0.T.reset_index().values

4.165910928999438

SO_10982266_depth3
------------------
ERROR!

SO_18172851_depth1
------------------
arg0.loc[arg1]

1.152512497999851

SO_49987108_depth2
------------------
ERROR!

SO_49583055_depth1
------------------
arg0.sort_values(by=["ID"])

4.137355733999357

SO_49583055_depth1
------------------
arg0.sort_values(by=["ID"])

4.092509506999704

SO_49572546_depth1
------------------
arg1.combine_first(arg0)

2.0150949630005925

SO_39656670_depth3
------------------
ERROR!

SO_11881165_depth1
------------------
arg0.loc[[0, 2, 4]]

1.057938928999647

SO_11941492_depth1
------------------
ERROR!

SO_21982987_depth3
------------------
ERROR!

SO_53762029_depth3
------------------
arg0.pivot_table(index=["doc_created_month", "doc_created_year", "speciality"]).cumsum()

302.4615383049986

SO_13261691_depth2
------------------
arg0.stack().unstack()

50.20894425500046

SO_12065885_depth3
------------------
arg0.loc[[2, 4, 6]]

1.0424643039987131

SO_13261175_depth1
------------------
arg0.pivot_table(values="value", index="name", columns=["type", "date"])

276.5202395289998

SO_13659881_depth2
------------------
arg0.groupby(["ip", "useragent"]).size()

1.0170146630007366

SO_14085517_depth1
------------------
arg0.sort_values(by=["SEGM1"])

195.83883650999996

SO_34365578_depth2
------------------
ERROR!

SO_11418192_depth2
------------------
arg0.query(arg2)

0.9639060649988096

SO_49592930_depth1
------------------
arg0.combine_first(arg1)

0.9801300960007211

SO_13793321_depth1
------------------
arg0.merge(arg1, on=10)

5.554725879999751

SO_13647222_depth1
------------------
ERROR!

SO_12860421_depth1
------------------
arg0.pivot_table(values="Z", index="Y", columns="Z", aggfunc=pd.Series.nunique)

683.4204104500004
ngsankha commented 2 years ago

This is implemented now!