microsoft / semantic-link-labs

Early access to new features for Microsoft Fabric's Semantic Link.
MIT License
178 stars 37 forks source link

BUG run_model_bpa_bulk #109

Closed matiirueda closed 2 months ago

matiirueda commented 2 months ago

Describe the bug

I was reviewing the results of running the loop across all workspaces, and for most of the datasets, it couldn't save data. It returns this error: "Model BPA failed for the 'Linaje' semantic model within the 'CIO' workspace. Single positional indexer is out-of-bounds." Most of the datasets from the workspaces are giving me that error. Do we know what might be causing it? Do you know what might be causing this? I'll mention it on Git.

This is my code for example for the workspace "CIO"

import sempy_labs as labs from pyspark.sql import SparkSession from sempy_labs import migration,report,directlake from sempy_labs import lakehouse as lake from sempy_labs.tom import connect_semantic_model

labs.run_model_bpa_bulk(extended= True, language = 'es-ES' , workspace='CIO' )

image

m-kovalsky commented 2 months ago

Does it work if you set ‘extended=False’?

If so, try running the tom.set_vertipaq_annotations() function against one of the models which didn’t work in the original bulk run.


From: matiirueda @.> Sent: Friday, August 30, 2024 7:50:10 PM To: microsoft/semantic-link-labs @.> Cc: Subscribed @.***> Subject: [microsoft/semantic-link-labs] BUG run_model_bpa_bulk (Issue #109)

Describe the bug

I was reviewing the results of running the loop across all workspaces, and for most of the datasets, it couldn't save data. It returns this error: "Model BPA failed for the 'Linaje' semantic model within the 'CIO' workspace. Single positional indexer is out-of-bounds." Most of the datasets from the workspaces are giving me that error. Do we know what might be causing it? Do you know what might be causing this? I'll mention it on Git.

This is my code for example for the workspace "CIO"

import sempy_labs as labs from pyspark.sql import SparkSession from sempy_labs import migration,report,directlake from sempy_labs import lakehouse as lake from sempy_labs.tom import connect_semantic_model

labs.run_model_bpa_bulk(extended= True, language = 'es-ES' , workspace='CIO' )

image.png (view on web)https://github.com/user-attachments/assets/c4530b13-f733-4a20-a52b-2d3a6964322d

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/semantic-link-labs/issues/109, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHBQBNQZA5OJBHUI2RDISMLZUCPEDAVCNFSM6AAAAABNM2EKPOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4TONZYGQ4DCOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

matiirueda commented 2 months ago

Hi Michael, it worked by removing the parameter. Thanks.

On the other hand, an hour later I got this error: LivyHttpRequestFailure: Failed to send request to Livy. Please try again. If this issue still exists, please contact support.

Does this mean that the function calling the API has a limit per hour? If so, how long should I wait before calling it again?

m-kovalsky commented 2 months ago

I made a fix going into the next release which should fix the issue for the extended=True issue.

Hmm, you may want to split up the workspaces into groups of multiple sessions. I will also look into parallelizing this to gain efficiencies.


From: matiirueda @.> Sent: Monday, September 2, 2024 4:37:28 PM To: microsoft/semantic-link-labs @.> Cc: Michael Kovalsky @.>; Comment @.> Subject: Re: [microsoft/semantic-link-labs] BUG run_model_bpa_bulk (Issue #109)

Hi Michael, it worked by removing the parameter. Thanks.

On the other hand, an hour later I got this error: LivyHttpRequestFailure: Failed to send request to Livy. Please try again. If this issue still exists, please contact support.

Does this mean that the function calling the API has a rate limit per hour? If so, how long should I wait before calling it again?

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/semantic-link-labs/issues/109#issuecomment-2324782742, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHBQBNSLW5HPUUTWFTRHATLZURSZRAVCNFSM6AAAAABNM2EKPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUG44DENZUGI. You are receiving this because you commented.Message ID: @.***>

matiirueda commented 2 months ago

Great, thanks for the improvements.

I have 70 workspaces and a total of 2500 semantic models. I created a loop to process each workspace, but it only manages to run about 14 workspaces. Do you think I should run several notebooks in parallel? Or is the Livy error due to too many API calls? If I add a 1-hour sleep and then continue with the remaining items in my catalog, would that solve the problem?

m-kovalsky commented 2 months ago

I’m not familiar with this error. Running several notebooks in parallel may be a more stable solution. Always good to test various options.


From: matiirueda @.> Sent: Monday, September 2, 2024 4:53:04 PM To: microsoft/semantic-link-labs @.> Cc: Michael Kovalsky @.>; Comment @.> Subject: Re: [microsoft/semantic-link-labs] BUG run_model_bpa_bulk (Issue #109)

Great, thanks for the improvements.

I have 70 workspaces and a total of 2500 semantic models. I created a loop to process each workspace, but it only manages to run about 14 workspaces. Do you think I should run several notebooks in parallel? Or is the Livy error due to too many API calls? If I add a 1-hour sleep and then continue with the remaining items in my catalog, would that solve the problem?

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/semantic-link-labs/issues/109#issuecomment-2324812820, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHBQBNWJXBWUTGTNO7XVIJTZURUUBAVCNFSM6AAAAABNM2EKPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRUHAYTEOBSGA. You are receiving this because you commented.Message ID: @.***>

matiirueda commented 2 months ago

Hi Michael, I think the Livy error is because the notebook gets filled with the output of the execution and runs out of memory. Is there a way to prevent the result from being displayed on the notebook screen and only save it to Delta?

The output is too large and hits Livy outputs limit. Please try to reduce your output or write it to a file.

Here's my code:

import pandas as pd import sempy_labs as labs from sempy_labs import migration, report, directlake from sempy_labs import lakehouse as lake from sempy_labs.tom import connect_semantic_model

all_datasets_df = spark.sql("SELECT DISTINCT Workspace_Name FROM VF_Auditoria_Modelos.all_items WHERE Type = 'SemanticModel'").toPandas() for index, row in all_datasets_df.iterrows(): workspace_name = row['Workspace_Name'] try:

    labs.run_model_bpa_bulk(
        language='es-ES',
        workspace=workspace_name
    )
except Exception:
     continue

And I also tried: CODE 2

import pandas as pd import sempy_labs as labs from sempy_labs import migration, report, directlake from sempy_labs import lakehouse as lake from sempy_labs.tom import connect_semantic_model

all_datasets_df = spark.sql("SELECT DISTINCT Workspace_Name FROM VF_Auditoria_Modelos.all_items WHERE Type = 'SemanticModel' ORDER BY Workspace_Name DESC").toPandas() results_df = pd.DataFrame(columns=['Workspace_Name', 'Result'])

for index, row in all_datasets_df.iterrows(): workspace_name = row['Workspace_Name'] try:

    result = labs.run_model_bpa_bulk(
        language='es-ES',
        workspace=workspace_name
    )

    results_df = results_df.append({
        'Workspace_Name': workspace_name,
        'Result': result
    }, ignore_index=True)

except Exception as e:

    results_df = results_df.append({
        'Workspace_Name': workspace_name,
        'Result': f"Error: {e}"
    }, ignore_index=True)

results_df.write.format("delta").save("/path/to/your/delta/table")

But it still shows the result on the screen.

image

matiirueda commented 2 months ago

Error in pipline Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - LivyHttpRequestFailure, Error value - Something went wrong while processing your request. Please try again later. HTTP status code: 500. Trace ID: 09944079-c5b5-469b-b2b7-5bda8cf7c98a.' :

image Error when entering the notebook and viewing the output image

I found a solution to the Livy error. When iterating through each workspace, it accumulates in the notebook's memory, and at some point, it runs out of memory, causing the Livy error.

What I did was add clear_output(wait=True) to my code to clear the output after each loop.

However, I still have some workspaces that are very large, and since the output is cleared after each loop, it doesn't manage to run all the datasets.

It would be great if, in the next release, when running BPA in other areas, you could choose which dataset to run instead of all of them. This way, I could run only those that were left out due to memory issues. A parameter to hide the output on the screen would also be useful, so the notebook's memory doesn't get filled up.

m-kovalsky commented 2 months ago

This is fixed in 0.7.3.