microsoft / mssql-django

The Microsoft Django backend for SQL Server provides a connectivity layer for Django on SQL Server or Azure SQL DB.
Other
338 stars 112 forks source link

Group by case not supported #278

Open delcroip opened 1 year ago

delcroip commented 1 year ago

There are some features which are not supported yet. Please check the Limitations first to see if your bug is listed.

Software versions

Problem description and steps to reproduce I need a complex query that stratify the data using case statement in annotate then use a annotate with aggregate to have summary, the querry is automatically generated based on a "cube" definition, here a similar situation

Insuree.all().annonate(case(when( 'DOB_range'=['2003-01-01','2023-01-01', ], then Value('0-20')), default=Value('20+')))\ .annonate(case (when( 'sex'='M', then Values('Men')), default=Value('Women'))) .annonate('cnt'=Count('id'))

the execution will fail because the group_by will have a case statement in it and SQL server don't support that

Expected behavior and actual behavior

because SQL server don't support such case I would expect this module to generate a sub-query with the case statement wrapped in the main query with the group_by

Error message/stack trace Exception has occurred: ProgrammingError ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Column 'tblInsuree.DOB' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW)") pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Column 'tblInsuree.DOB' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW)")

The above exception was the direct cause of the following exception:

File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/builders/adx.py", line 48, in create_adx_data_value for item in queryset: File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/builders/adx.py", line 125, in _build_group_data_values values = self.data_value_mapper(data_value).create_adx_data_value(org_unit_obj, period) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/builders/adx.py", line 118, in create_adx_group data_values=self._build_group_data_values(period, org_unit_obj), File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/builders/adx.py", line 148, in _build_adx_groups groups.append(group_mapper.create_adx_group(period, org_unit_obj, org_unit)) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/builders/adx.py", line 139, in create_adx_cube groups=self._build_adx_groups(period, org_units) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/services/adx_service.py", line 46, in _build_cube return ADXBuilder(mapping_cube).create_adx_cube(self.period, org_units) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/services/adx_service.py", line 38, in build_enrolment_cube return self._build_cube(get_enrollment_cube(self.period), org_units) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/scheduled_tasks/init.py", line 39, in adx_monthly_sync adx_client.post_cube(service.build_enrolment_cube()) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/management/commands/pushadx.py", line 53, in sync_adx adx_monthly_sync(date) File "/home/delcpa/Development/openimis-be_py/src/openimis-be-dhis2-etl/dhis2_etl/management/commands/pushadx.py", line 48, in handle self.sync_adx(date, scope) File "/home/delcpa/Development/openimis-be_py/openIMIS/manage.py", line 15, in execute_from_command_line(sys.argv) django.db.utils.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Column 'tblInsuree.DOB' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW)")

Any other details that can be helpful another solution would be to write explicit NestedQuery in django but it is not supported

mShan0 commented 1 year ago

Hi @delcroip, does this still happen on the latest version of mssql-django?

delcroip commented 1 year ago

Hi,

it does not crash but it does not work either

here the actual query printed with str(queryset.query):

SELECT 
CASE WHEN [tblInsuree].[DOB] BETWEEN 2016-12-02 AND 2022-12-01 THEN 0_5 WHEN [tblInsuree].[DOB] BETWEEN 2009-12-02 AND 2016-12-01 THEN 6_12 WHEN [tblInsuree].[DOB] BETWEEN 2003-12-02 AND 2009-12-01 THEN 13_18 WHEN [tblInsuree].[DOB] BETWEEN 1996-12-02 AND 2003-12-01 THEN 19_25 WHEN [tblInsuree].[DOB] BETWEEN 1986-12-02 AND 1996-12-01 THEN 26_35 WHEN [tblInsuree].[DOB] BETWEEN 1966-12-02 AND 1986-12-01 THEN 36_55 WHEN [tblInsuree].[DOB] BETWEEN 1946-12-02 AND 1966-12-01 THEN 56_75 WHEN [tblInsuree].[DOB] < 1946-12-01 THEN 76P ELSE NULL END AS [cat_ageGroup], 
CASE WHEN [tblInsuree].[Gender] = M THEN M WHEN [tblInsuree].[Gender] = F THEN F ELSE NULL END AS [cat_sex], 
COUNT_BIG([tblInsuree].[InsureeID]) AS [adx_value] FROM [tblInsuree] INNER JOIN [tblFamilies] ON ([tblInsuree].[FamilyID] = [tblFamilies].[FamilyID]) INNER JOIN [tblLocations] ON ([tblFamilies].[LocationId] = [tblLocations].[LocationId]) 
WHERE ([tblLocations].[ParentLocationId] = 21 AND [tblInsuree].[ValidityTo] IS NULL) 
GROUP BY 
[tblInsuree].[InsureeID], 
[tblInsuree].[DOB], 
[tblInsuree].[Gender], 
CASE WHEN [tblInsuree].[DOB] BETWEEN 2016-12-02 AND 2022-12-01 THEN 0_5 WHEN [tblInsuree].[DOB] BETWEEN 2009-12-02 AND 2016-12-01 THEN 6_12 WHEN [tblInsuree].[DOB] BETWEEN 2003-12-02 AND 2009-12-01 THEN 13_18 WHEN [tblInsuree].[DOB] BETWEEN 1996-12-02 AND 2003-12-01 THEN 19_25 WHEN [tblInsuree].[DOB] BETWEEN 1986-12-02 AND 1996-12-01 THEN 26_35 WHEN [tblInsuree].[DOB] BETWEEN 1966-12-02 AND 1986-12-01 THEN 36_55 WHEN [tblInsuree].[DOB] BETWEEN 1946-12-02 AND 1966-12-01 THEN 56_75 WHEN [tblInsuree].[DOB] < 1946-12-01 THEN 76P ELSE NULL END, 
CASE WHEN [tblInsuree].[Gender] = M THEN M WHEN [tblInsuree].[Gender] = F THEN F ELSE NULL END

the issue is that you have a group by [tblInsuree].[InsureeID] killing all the grouping value ...

you should have a nested query instead


SELECT adx_value, cat_ageGroup, cat_sex FROM (
SELECT 
CASE WHEN [tblInsuree].[DOB] BETWEEN 2016-12-02 AND 2022-12-01 THEN 0_5 WHEN [tblInsuree].[DOB] BETWEEN 2009-12-02 AND 2016-12-01 THEN 6_12 WHEN [tblInsuree].[DOB] BETWEEN 2003-12-02 AND 2009-12-01 THEN 13_18 WHEN [tblInsuree].[DOB] BETWEEN 1996-12-02 AND 2003-12-01 THEN 19_25 WHEN [tblInsuree].[DOB] BETWEEN 1986-12-02 AND 1996-12-01 THEN 26_35 WHEN [tblInsuree].[DOB] BETWEEN 1966-12-02 AND 1986-12-01 THEN 36_55 WHEN [tblInsuree].[DOB] BETWEEN 1946-12-02 AND 1966-12-01 THEN 56_75 WHEN [tblInsuree].[DOB] < 1946-12-01 THEN 76P ELSE NULL END AS [cat_ageGroup], 
CASE WHEN [tblInsuree].[Gender] = M THEN M WHEN [tblInsuree].[Gender] = F THEN F ELSE NULL END AS [cat_sex], 
COUNT_BIG([tblInsuree].[InsureeID]) AS [adx_value] FROM [tblInsuree] INNER JOIN [tblFamilies] ON ([tblInsuree].[FamilyID] = [tblFamilies].[FamilyID]) INNER JOIN [tblLocations] ON ([tblFamilies].[LocationId] = [tblLocations].[LocationId]) 
WHERE ([tblLocations].[ParentLocationId] = 21 AND [tblInsuree].[ValidityTo] IS NULL) 
)
GROUP BY cat_ageGroup, cat_sex

that being said, I manage to get my result using windows+distinct (sorry for the overcomplex example, but I need something dynamic) I replace that:

for a in annotation: # contains the case annotation
                queryset = queryset.annotate(**a)
queryset = queryset.annotate('adx_value'= Count(id)).values('adx_value',*[get_sql_name(c.category_name) for c in self.categories])

by

def get_case(category_options):
    whens = [When(co.filter, then=Value(f"{co.code}")) for co in category_options]
    return Case(
                *whens
           )

def get_annotation_window(key, categories, agg_fct):
    return {
        key:  Window(
            expression= agg_fct,
            partition_by = [get_case( c.category_options)  for c in categories]
           )
    }

for a in annotation: # contains the case annotation
                queryset = queryset.annotate(**a)
queryset = queryset.annotate(**get_annotation_window('adx_value' ,self.categories,Count(id))).values('adx_value',*[get_sql_name(c.category_name) for c in self.categories]).distinct()
mShan0 commented 1 year ago

Thanks for the extra info. Will be looking further into this.

federicoemartinez commented 8 months ago

Hi there, I have hit the same issue. Has there been any progress? Any idea of what to do in order to create a PR to support this?

federicoemartinez commented 8 months ago

This seem related to #316. @mShan0 do you think that replacing values could be a solution? maybe make it configurable?