microsoft / Analysis-Services

Git repo for Analysis Services samples and community projects
MIT License
607 stars 416 forks source link

BPA Rules: Identify Reference Same Measure Many Times Instead Using Variables #237

Closed rafaelxkr closed 1 year ago

rafaelxkr commented 1 year ago

I want a script to identify DAX querys that has call same measure many time instead using variables, is very commum find in SWITCH e IF fuctions

@m-kovalsky could you help me please?

I create 2 options to use in BPA, but they work only script C#

1° Option:

using System.Text.RegularExpressions;

string Measure2 = "SWITCH(TRUE(), [Medida1] > 50, 10,[Medida2] > 10, 5,[Medida3] > 0, 1, [Medida4] )";

var pattern2 = @"\[(\w+)\]";
var    Result3 = Regex.Matches(Measure2, pattern2).Cast<Match>().SelectMany(v => v.Groups[1].Value.Trim().Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries)).GroupBy(Value => Value).Any(group => group.Skip(1).Any());
Result3.Output();

I added in BPA (Error ) image

Regex.Matches(Expression, "\[(\w+)\]").Cast<Match>().SelectMany(v => v.Groups[1].Value.Trim().Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries)).GroupBy(Value => Value).Any(group => group.Skip(1).Any())

2° Option:

using System.Text.RegularExpressions;

string Measure2 = "SWITCH(TRUE(), [Medida1] > 50, 10,[Medida2] > 10, 5,[Medida3] > 0, 1, [Medida4] )";

var pattern2 = @"[^[\]]+(?=[[]|$)";
var    Result3 = Regex.Split(Measure2, pattern).Where(x => x.StartsWith("[") && x.EndsWith("]")).Distinct().Count() !=Regex.Split(Measure2, pattern).Where(x => x.StartsWith("[") && x.EndsWith("]")).Count();
Result3.Output();

I added in BPA (Error ) image

Regex.Split(Expression, "[^[\]]+(?=[[]|$)").Where(x => x.StartsWith("[") && x.EndsWith("]")).Distinct().Count() !=Regex.Split(Expression, "[^[\]]+(?=[[]|$)").Where(x => x.StartsWith("[") && x.EndsWith("]")).Count()
marcosqlbi commented 1 year ago

You should pay attention to such a rule - the presence of multiple references to the same measures could be good or bad, depending on the syntax. If you have different SWITCH branches, moving the measure reference to a variable could break the engine optimization. Read more here: https://docs.sqlbi.com/dax-internals/optimization-notes/switch-optimization We had to implement a full DAX parser to identify these issues in www.daxoptimizer.com

rafaelxkr commented 1 year ago

@marcosqlbi I don't think I explained properly what I'm trying to look for.

I have other rule for idenfy iterators and recomend creat a column in Database in this case SUMX(Table, [Column1] * [Column2]) or SUMX(Table, [Column1] + [Column2])

I want to find multiple references, this code only identify measures that can to be possibles issues. I know that false positives can appear, but the idea is to make the person reevaluate these types of measures.

The goal is to find duplicate words that are in square brackets.

REGEX 1° Option = https://regex101.com/r/7txeLa/1 REGEX 2° Option = https://regex101.com/r/ZdVW57/1

Example 1: ( multiples [teste] )

Example 2: ( multiples [rn] )

Example 3: ( multiples [teste] )

marcosqlbi commented 1 year ago

I understand your point. I'm just saying that a regular expression cannot work well, you either create false positives or false negatives. If the goal is to increase the efficiency of the development and of the code, you might actually create a counterproductive rule (if there are too many false positives, people will end up ignoring the rule).

otykier commented 1 year ago

@rafaelxkr BPA rule expressions use Dynamic LINQ syntax which is slightly different from C#. Most importantly, the Lambda is implicit, so instead of:

.Any(x => x.StartsWith("[") && x.EndsWith("]"))

you would have to type

.Any(StartsWith("[") and EndsWith("]"))

Also, in Dynamic LINQ, strings are verbatim by default, so you don't need to escape special characters (write double quotes twice "" to represent a double quote in the string ").

So if all you need is a rule that detects whether the same name appears in square brackets two or more times, you could use the following expression for the BPA rule:

RegEx.IsMatch(Expression, "(?i)\[([^\]]+)\].*?\[\1\]")

If you want the rule to only flag expressions where the same name appears in square brackets three or more times, use this:

RegEx.IsMatch(Expression, "(?i)\[([^\]]+)\].*?\[\1\].*?\[\1\]")

etc...

But as Marco wrote, this is bound to give you all sorts of false positives, since you can't rely on RegEx to determine if each measure reference is used in the same evaluation context (which would be a prerequisite for optimizing the expression to use a variable instead). For that, you need a full DAX parser...

m-kovalsky commented 1 year ago

I attempted to create such a rule based on RegEx some time ago but came to the conclusion that it is not really feasible using RegEx. There are too many exceptions and variations to encapsulate using this method. The best way to achieve this type of rule is through a true DAX parser which is what Daniel & Marco did with Tabular Editor 3 and the newly released DAX Optimizer.

rafaelxkr commented 1 year ago

@m-kovalsky @otykier @marcosqlbi

I started with this first version I used the suggestion @otykier and modified to find only unqualified measure references and excluded first argument of CALCULATE that I need to refine with time.

I had good result because the most datasets have simple measures, thanks for considerations !!!

RegEx.IsMatch(Regex.Replace(Regex.Replace(Expression, "\r\n?|\n", " "),"CALCULATE\s*\(\s*\[(?:(?!\,).)*,", ""), "(?i)[^A-Za-z0-9']\[([^\]]+)\].*?[^A-Za-z0-9']\[\1\]")
rafaelxkr commented 1 year ago

I close de issue many thanks !!!