microsoft / Analysis-Services

Git repo for Analysis Services samples and community projects
MIT License
556 stars 391 forks source link

BPARules : Add data category for columns does not work properly for Latitude and Longtitude #226

Open morelgeorge opened 10 months ago

morelgeorge commented 10 months ago

Hello, from my point of view the rule Add data category for columns does not work properly for Latitude and Longtitude columns. I believe the issue is related with its bracket definition.

I suggest to update the rule definition by following way:

( string.IsNullOrWhitespace(DataCategory) or DataCategory.ToLower() == "uncategorized" ) and ( ( ( Name.ToLower().Contains("country") or Name.ToLower().Contains("continent") or Name.ToLower().Contains("city") ) and DataType == "String" ) or ( ( Name.ToLower() == "latitude" or Name.ToLower() == "longitude" ) and ( DataType == DataType.Decimal or DataType == DataType.Double ) ) )

I did some testing on my test dataset and it has been working fine. But I will appreciate any feedback from your side.

Regards, Jiri

m-kovalsky commented 10 months ago

This portion of the rule as it is catches columns named 'latitude' or 'longitude' which are of decimal or double datatype and do not have a data category. It appears you have added a clause that assesses if the data category shows as 'uncategorized' but this value would be blank if it is uncategorized. No one would intentionally mark this as 'uncategorized'. Power BI Desktop shows a column as 'uncategorized' if the data category has not been set but the actual .bim file does not show 'uncategorized' so looking for 'uncategorized' would not be helpful.

morelgeorge commented 10 months ago

Hello, thank you for your response. You are right that no one should intentionally mark this as 'uncategorized'. But I am working with the scenario when someone has changed the Data category by mistake. PowerBI Desktop is filling the data category in .bim model with this value when you set the Uncategorized Data Category in model manually. By default this value is not filled in .bim model. But let's try following example.

I have a column State. In Power BI Desktop I change the Data category to a wrong value, e.g. City. Then I decide to change it back to Uncategorized. TabularEditor now shows the Data Category as "Uncategorized" in the .bim model. I know this scenario is rare, but it is a rule and I believe it could check most of the possibilities. And this was a small change to the rule definition. But as you mentioned this is a PowerBI feature.

However this change in the rule is not necessary and it does not take effect with the longitude and latitude issue.

The origin rule does not take effect for the longitude and latitude columns at all. I made a change in parenthesis command ordering. I attached a zip file with the sample .bim model. It contains only a one table with Langitude and Longitude column. Both have Data Category set. Unfortunately origin rule still shows these columns as violating this rule. My update of the rule does not.

langitude_longitude_example.zip