mindsdb / dfsql

SQL interface to Pandas
GNU General Public License v3.0
51 stars 9 forks source link

Table alias and groupby produces wrong output #14

Closed btseytlin closed 1 year ago

btseytlin commented 3 years ago

This test fails:

def test_complex_groupby(self, googleplay_csv, data_source_googleplay):
        sql = """SELECT sub.category, avg(reviews) AS avg_reviews
                    FROM (
                        SELECT category, CAST(reviews AS float) AS reviews
                         FROM (
                            SELECT category, reviews
                            FROM googleplaystore
                            LIMIT 100
                         )
                    ) AS sub
                    GROUP BY sub.category
                    HAVING avg_reviews > 0.4
                    LIMIT 10"""

        df = pd.read_csv(googleplay_csv)
        inner = df[['Category', 'Reviews']].iloc[:100]
        out_df = inner.groupby(['Category']).agg({'Reviews': 'mean'}).reset_index()
        out_df.columns = ['sub.category', 'avg_reviews']

        query_result = data_source_googleplay.query(sql)
        assert out_df.shape == query_result.shape
        assert (out_df.dropna().values == query_result.dropna().values).all()
tomhuds commented 1 year ago

@StpMax can we archive this project? or close issue?