yehoshuadimarsky / python-ssas

A proof of concept to integrate Python and Microsoft Analysis Services
MIT License
77 stars 33 forks source link

Parsing issues when DateTime column is blank #11

Open samusaran opened 3 years ago

samusaran commented 3 years ago

We've got the following error on some queries:

AttributeError: 'float' object has no attribute 'ToString'

at the following line:

ser = df.loc[:, dtt].map(lambda x: x.ToString('s'))

The error seems to happen when we have a DateTime column in the resultset that have some BLANK values. In those cases the variable seems to be float instead of .NET DateTime.

yehoshuadimarsky commented 3 years ago

oooh this is a famous problem in pandas, that NaNs are really floats (actually here it's probably a NaT). I recall it causing lots of issues there. Guess I can make a fix here but tied up at the moment, please submit a pull request if you can fix it. The solution will be something like replacing NaTs with something that could be converted by the .Net ToString method.

samusaran commented 3 years ago

Ok I'll try something (i'm really not very proficient with python, but I work a lot with .NET and SSAS).

yehoshuadimarsky commented 3 years ago

Please fix this issue @yehoshuadimarsky.

Feel free to submit a pull request. I open sourced this for others to use, but I make no guarantee about supporting it. I will do my best to fix issues as they come up, subject to time and other constraints.

samusaran commented 3 years ago

Sadly this workaround would create some slowdowns, since IFs are processed by the formula engine. In some cases it may not be acceptable.

Sent from Ninehttp://www.9folders.com/


Da: João Flávio Santos @.***> Inviato: domenica 9 maggio 2021 06:02 A: yehoshuadimarsky/python-ssas Cc: Alessandro Losi; Author Oggetto: Re: [yehoshuadimarsky/python-ssas] Parsing issues when DateTime column is blank (#11)

I have used this method to work around this issue:


dax_string= ''' EVALUATE SELECTCOLUMNS( table, "NAME_COLUMN", IF(table[column_data] = BLANK(), "null", table[column_data) ) '''


After that, it's just replacing the "null" with the Pandas. 👍

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/yehoshuadimarsky/python-ssas/issues/11#issuecomment-835665272, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACG7N7TTHT5JGZXOQ4SF23TMYCORANCNFSM4XWUHQZA.

samusaran commented 3 years ago

I'm not saying it won't work, it would just be slower. Imagine setting every measure to "null" instead of leaving it blank. It would tank the performance on more complex models, since for every measure you would force the formula engine to take over.

Also, if done carelessly, the "IF(ISBLANK(measure), "something else)" would absolutely destroy performance since row elimination would not be enabled anymore.

It's a fine workaround if used correctly, but not a final solution, nor a good way to work in DAX.

Sent from Ninehttp://www.9folders.com/


Da: João Flávio Santos @.***> Inviato: domenica 9 maggio 2021 08:20 A: yehoshuadimarsky/python-ssas Cc: Alessandro Losi; Mention Oggetto: Re: [yehoshuadimarsky/python-ssas] Parsing issues when DateTime column is blank (#11)

Here's working great with a thousand lines. :D

Just added a few filters in the function and stayed Show! My goal is to use this base in RPA projects and integration. Because in Power BI Service cubes I already have cross/unified business rules.

dax_string= ''' EVALUATE var filteredTable = FILTER(table;table[filter_one]="yes") return

SELECTCOLUMNS( filteredTable, "NAME_COLUMN", IF(table[column_data] = BLANK(), "null", table[column_data) ) '''

https://stackoverflow.com/questions/57084921/combine-powerbi-dax-filter-and-selectcolumn

Obrigado pelo retorno @samusaranhttps://github.com/samusaran!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yehoshuadimarsky/python-ssas/issues/11#issuecomment-835706784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACG7NZ2GJT74CT2Q5JNHQ3TMYSRTANCNFSM4XWUHQZA.

kaesb commented 2 years ago

Dumb workaround:

from System import DateTime
def nan_fix(x): 
    if isinstance(x, float):
        if np.isnan(x):
            x = DateTime(1678,1,1,0,0,0)
    return x
# ser = df.loc[:, dtt].map(lambda x: x.ToString('s'))
ser = df.loc[:, dtt].map(lambda x: nan_fix(x).ToString('s'))

Then handle empty dates however. Wont work if the dataset has actual 1678-01-01 dates...