yehoshuadimarsky / python-ssas

A proof of concept to integrate Python and Microsoft Analysis Services
MIT License
77 stars 33 forks source link

Running ssas_api.py and getting connected to AAS issue #12

Closed nisharncslabs closed 3 years ago

nisharncslabs commented 3 years ago

Hi, I followed the steps in the README.md file and imported the required dll extension (and edited the required version info in the python file path snippet provided as well).

However, once I go to run the a version of the Quickstart I run into an error: NameError: name 'DataTable' is not defined.

Here is the python file path snippet with the correct version and path that I have on my machine:

base = "C:/Program Files/PackageManagement/NuGet/Packages/Microsoft.AnalysisServices"
_version = "19.20.1.0"  # at time of this writing
AMO_PATH = f"{base}.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.Tabular.dll"
ADOMD_PATH = f"{base}.AdomdClient.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.AdomdClient.dll"

My question is:

yehoshuadimarsky commented 3 years ago

Can you post your complete code example that fails?

nisharncslabs commented 3 years ago

Sure, so I have 2 files in my editor (ssas_api.py copy exactly same as in repo)

  1. Quick start script with code to be run to get dataframe back from DAX query ping to AAS server.
  2. ssas_api.py copy file containing functions which can connect to AAS.

These are both shown down below.

My main question is: I am not too sure what additional things I need to do along with having these files in order to run/get the DAX query data successfully from a particular AAS server.

File 1: QuickStart

import ssas_api
import clr

'''
# Python Snippet showing where the required dll files are located

base = "C:/Program Files/PackageManagement/NuGet/Packages/Microsoft.AnalysisServices"
_version = "19.20.1.0"  # at time of this writing
AMO_PATH = f"{base}.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.Tabular.dll"
ADOMD_PATH = f"{base}.AdomdClient.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.AdomdClient.dll"
'''

#_load_assemblies(amo_path=amo_path, adomd_path=adomd_path)

conn = ssas_api.set_conn_string(
    server='<aas_servername>',
    db_name='<db_name>',
    username='<email>',
    password='<password>'
)

dax_string = '''  
MY DAX QUERY
'''

df = ssas_api.get_DAX(connection_string=conn, dax_string=dax_string)

File 2: ssas_api.py script for AAS connection

# -*- coding: utf-8 -*-
"""
Created on Wed Sep 20 16:59:43 2017

@author: Yehoshua
"""

import pandas as pd
import numpy as np
from functools import wraps
from pathlib import Path
import logging
import warnings

logger = logging.getLogger(__name__)

try:
    import clr  # name for pythonnet
except ImportError:
    msg = """
    Could not import 'clr', install the 'pythonnet' library. 
    For conda, `conda install -c pythonnet pythonnet`
    """
    raise ImportError(msg)

def _load_assemblies(amo_path=None, adomd_path=None):
    """
    Loads required assemblies, called after function definition.
    Might need to install SSAS client libraries:
    https://docs.microsoft.com/en-us/azure/analysis-services/analysis-services-data-providers

    Parameters
    ----------
    amo_path : str, default None
        The full path to the DLL file of the assembly for AMO.
        Should end with '**Microsoft.AnalysisServices.Tabular.dll**'
        Example: C:/my/path/to/Microsoft.AnalysisServices.Tabular.dll
        If None, will use the default location on Windows.
    adomd_path : str, default None
        The full path to the DLL file of the assembly for ADOMD.
        Should end with '**Microsoft.AnalysisServices.AdomdClient.dll**'
        Example: C:/my/path/to/Microsoft.AnalysisServices.AdomdClient.dll
        If None, will use the default location on Windows.
    """
    # Full path of .dll files
    root = Path(r"C:\Windows\Microsoft.NET\assembly\GAC_MSIL")
    # get latest version of libraries if multiple libraries are installed (max func)
    if amo_path is None:
        amo_path = str(
            max((root / "Microsoft.AnalysisServices.Tabular").iterdir())
            / "Microsoft.AnalysisServices.Tabular.dll"
        )
    if adomd_path is None:
        adomd_path = str(
            max((root / "Microsoft.AnalysisServices.AdomdClient").iterdir())
            / "Microsoft.AnalysisServices.AdomdClient.dll"
        )

    # load .Net assemblies
    logger.info("Loading .Net assemblies...")
    clr.AddReference("System")
    clr.AddReference("System.Data")
    clr.AddReference(amo_path)
    clr.AddReference(adomd_path)

    # Only after loaded .Net assemblies
    global System, DataTable, AMO, ADOMD

    **import System
    from System.Data import DataTable
    import Microsoft.AnalysisServices.Tabular as AMO
    import Microsoft.AnalysisServices.AdomdClient as ADOMD**

    logger.info("Successfully loaded these .Net assemblies: ")
    for a in clr.ListAssemblies(True):
        logger.info(a.split(",")[ 0 ])

def _assert_dotnet_loaded(func):
    """
    Wrapper to make sure that required .NET assemblies have been loaded and imported.
    Can pass the keyword arguments 'amo_path' and 'adomd_path' to any annotated function,
    it will use them in the `_load_assemblies` function.

    Example:
        .. code-block:: python

            import ssas_api
            conn = ssas_api.set_conn_string(
                's', 'd', 'u', 'p',
                amo_path='C:/path/number/one',
                adomd_path='C:/path/number/two'
            )
    """

    @wraps(func)
    def wrapper(*args, **kwargs):
        amo_path = kwargs.pop("amo_path", None)
        adomd_path = kwargs.pop("adomd_path", None)
        try:
            type(DataTable)
        except NameError:
            # .NET assemblies not loaded/imported
            logger.warning(".Net assemblies not loaded and imported, doing so now...")
            _load_assemblies(amo_path=amo_path, adomd_path=adomd_path)
        return func(*args, **kwargs)

    return wrapper

@_assert_dotnet_loaded
def set_conn_string(server, db_name, username, password):
    """
    Sets connection string to SSAS database,
    in this case designed for Azure Analysis Services
    """
    conn_string = (
        "Provider=MSOLAP;Data Source={};Initial Catalog={};User ID={};"
        "Password={};Persist Security Info=True;Impersonation Level=Impersonate".format(
            server, db_name, username, password
        )
    )
    return conn_string

@_assert_dotnet_loaded
def get_DAX(connection_string, dax_string):
    """
    Executes DAX query and returns the results as a pandas DataFrame

    Parameters
    ---------------
    connection_string : string
        Valid SSAS connection string, use the set_conn_string() method to set
    dax_string : string
        Valid DAX query, beginning with EVALUATE or VAR or DEFINE

    Returns
    ----------------
    pandas DataFrame with the results
    """
    table = _get_DAX(connection_string, dax_string)
    df = _parse_DAX_result(table)
    return df

def _get_DAX(connection_string, dax_string) -> "DataTable":
    dataadapter = ADOMD.AdomdDataAdapter(dax_string, connection_string)
    table = DataTable()
    logger.info("Getting DAX query...")
    dataadapter.Fill(table)
    logger.info("DAX query successfully retrieved")
    return table

def _parse_DAX_result(table: "DataTable") -> pd.DataFrame:
    cols = [ c for c in table.Columns.List ]
    rows = [ ]
    # much better performance to just access data by position instead of name
    # and then add column names afterwards
    for r in range(table.Rows.Count):
        row = [ table.Rows[ r ][ c ] for c in cols ]
        rows.append(row)

    df = pd.DataFrame.from_records(rows, columns=[ c.ColumnName for c in cols ])

    # replace System.DBNull with None
    # df.replace({System.DBNull: np.NaN}) doesn't work for some reason
    df = df.applymap(lambda x: np.NaN if isinstance(x, System.DBNull) else x)

    # convert datetimes
    dt_types = [ c.ColumnName for c in cols if c.DataType.FullName == "System.DateTime" ]
    if dt_types:
        for dtt in dt_types:
            # if all nulls, then pd.to_datetime will fail
            if not df.loc[ :, dtt ].isna().all():
                # https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-date-and-time-format-strings#Sortable
                ser = df.loc[ :, dtt ].map(lambda x: x.ToString('s'))
                df.loc[ :, dtt ] = pd.to_datetime(ser)

    # convert other types
    types_map = {"System.Int64": int, "System.Double": float, "System.String": str}
    col_types = {c.ColumnName: types_map.get(c.DataType.FullName, "object") for c in cols}

    # handle NaNs (which are floats, as of pandas v.0.25.3) in int columns
    col_types_ints = {k for k, v in col_types.items() if v == int}
    ser = df.isna().any(axis=0)
    col_types.update({k: float for k in set(ser[ ser ].index).intersection(col_types_ints)})

    # convert
    df = df.astype(col_types)

    return df

@_assert_dotnet_loaded
def process_database(connection_string, refresh_type, db_name):
    process_model(
        connection_string=connection_string,
        item_type="model",
        refresh_type=refresh_type,
        db_name=db_name,
    )

@_assert_dotnet_loaded
def process_table(connection_string, table_name, refresh_type, db_name):
    process_model(
        connection_string=connection_string,
        item_type="table",
        item=table_name,
        refresh_type=refresh_type,
        db_name=db_name,
    )

@_assert_dotnet_loaded
def process_model(connection_string, db_name, refresh_type="full", item_type="model", item=None):
    """
    Processes SSAS data model to get new data from underlying source.

    Parameters
    -------------
    connection_string : string
        Valid SSAS connection string, use the set_conn_string() method to set
    db_name : string
        The data model on the SSAS server to process
    refresh_type : string, default `full`
        Type of refresh to process. Currently only supports `full`.
    item_type : string, choice of {'model','table'}, default 'model'
    item : string, optional.
        Then name of the item. Only needed when item_type is 'table', to specify the table name
    """
    assert item_type.lower() in ("table", "model"), f"Invalid item type: {item_type}"
    if item_type.lower() == "table" and not item:
        raise ValueError("If item_type is table, must supply an item (a table name) to process")

    # connect to the AS instance from Python
    AMOServer = AMO.Server()
    logger.info("Connecting to database...")
    AMOServer.Connect(connection_string)

    # Dict of refresh types
    refresh_dict = {"full": AMO.RefreshType.Full}

    # process
    db = AMOServer.Databases[ db_name ]

    if item_type.lower() == "table":
        table = db.Model.Tables.Find(item)
        table.RequestRefresh(refresh_dict[ refresh_type ])
    else:
        db.Model.RequestRefresh(refresh_dict[ refresh_type ])

    op_result = db.Model.SaveChanges()
    if op_result.Impact.IsEmpty:
        logger.info("No objects affected by the refresh")

    logger.info("Disconnecting from Database...")
    # Disconnect
    AMOServer.Disconnect()
yehoshuadimarsky commented 3 years ago

No you do not need to install System or any of the .Net imports, those are .Net libraries that we are loading through Python using Pythonnet. This should work as written. Is it working?

nisharncslabs commented 3 years ago

No it doesn't seem to be working. The errors I find when I try running my Quickstart file (as stated above) are:

  1. NameError: name 'DataTable' is not defined: Which I think is caused by the for line of code in the ssas_api.py script not being detected.
  2. During handling of the above exception, another exception occurred: FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Windows\\Microsoft.NET\\assembly\\GAC_MSIL\\Microsoft.AnalysisServices.Tabular' : Which I think is the result of the .dll files not being in the location required/ assumed by your ssas_api.py script? As a result might need to change the .dll file location in the script using the python snippet provided? I might be wrong about this?

I just want to ensure that if we take the Quickstart & ssas_api.py (as is) into the same directory and run the Quickstart it should work as is? I am not entirely sure if I need to change anything or am doing something wrong. I just want to be able send a DAX query against AAS and receive a response and store the result in a dataframe. Not sure if you are able to provide further guidance on how to do this?

nisharncslabs commented 3 years ago

I did also follow the steps in the README.md "Getting The Required .Net Libraries" regarding the .net libraries and confirmed that the following python snippet applies in my case:

base = "C:/Program Files/PackageManagement/NuGet/Packages/Microsoft.AnalysisServices"
_version = "19.20.1.0"  # at time of this writing
AMO_PATH = f"{base}.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.Tabular.dll"
ADOMD_PATH = f"{base}.AdomdClient.retail.amd64.{_version}/lib/net45/Microsoft.AnalysisServices.AdomdClient.dll"

Do I need to copy this info into the ssas_api.py file in anyway for it to run?

yehoshuadimarsky commented 3 years ago

yes that is the issue. your .Net DLLs are not being found. All of the functions optionally take params to specify where they are (passing to the underlying _load_assemblies decorator function), if no params are passed then it looks in the default locations, see the code in the snippet below. https://github.com/yehoshuadimarsky/python-ssas/blob/fc189377d6121b91fe262e44b68b5351cb6c7bf6/ssas_api.py#L47-L59 So if it's not finding it in the default location, you should pass the location as params, such as this

df = ssas_api.get_DAX(
    connection_string=conn, 
    dax_string=dax_string, 
    amo_path='C:/path/number/one',
    adomd_path='C:/path/number/two'
)

Example in the code/docstring: https://github.com/yehoshuadimarsky/python-ssas/blob/fc189377d6121b91fe262e44b68b5351cb6c7bf6/ssas_api.py#L83-L96

nisharncslabs commented 3 years ago

Ok, that makes sense however even when I used the file paths as specififed ie AMO_PATH & ADOMD_PATH in the fashion that you are suggesting (inside the get_DAX function) it stills throws the error. I have attached a screenshot of the error and the code chunk I think its referring too.

2021-05-05 (3)_LI 2021-05-05 (5)_LI

yehoshuadimarsky commented 3 years ago

That doesn't make sense.

nisharncslabs commented 3 years ago

The error that I am still getting is the same as what I described above.

dominik-hayd commented 3 years ago

Hey @nisharncslabs, I know it's been a while since you created this issue and I'm not the owner of this project. But since i encountered the same issues I want to share how I solved them and my understanding on whats the problem in your case.

First of all: The lines 70-73 should be highlighted by your IDE. It reads these imports and looks for matching python packages, which are obviously not available. The imports are loaded through pythonnet from the .DLL files. But this is just a warning and can be safely ignored, because we know why they occur.

Now to your error message. I think python is a bit misleading in this case. It reports two exceptions:

  1. The NameError: DataTable not defined. The exception is thrown in line 102. @yehoshuadimarsky accesses the type of DataTable there to check whether the .Net dlls are already loaded. So at the first execution this check has to throw an exception to trigger the dll loading, which is done in the except block (line 107).
  2. The FileNotFound error. This comes from pythonnet trying to load one of the dll files. This is the real problem.

Because the second exception is thrown in the except block python shows you both exception.

Now to solve the problem: (I can only tell you how I solved it 🙈)

I copied both .dll files (Tabular and AdomdClient) to the root directory of my project. In my first call to python_ssas I passed absolute paths to both files. conn = powerbi.set_conn_string('workspace', 'datset', 'user', 'password', amo_path='C:\\Users\\...\\Microsoft.AnalysisServices.Tabular.dll', adomd_path='C:\\Users\\...\\Microsoft.AnalysisServices.AdomdClient.dll')

For me this solves the error. If this isn't working for you play around with importing the dlls for a bit: Create a new python file to test just the import without an extra stuff.

import clr
clr.AddReference("System")
clr.AddReference("System.Data")
clr.AddReference(<path to your tabular dll>)
clr.AddReference(<path to your adomdclient dll>)

import System
from System.Data import DataTable
import Microsoft.AnalysisServices.Tabular as AMO
import Microsoft.AnalysisServices.AdomdClient as ADOMD

Adjust the paths until you get this script running and then supply the same paths to your first call to one of the python_ssas functions.

Hope this helps 😃