Open jingwora opened 1 year ago
Hey @jingwora :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
Hi @jingwora , confirm that I can repro this issue.
This issue is from Cognitive Service. I can repro this issue without using SynapseML.
key = '' #cognitive service key
endpoint = "" #cognitive service endpoint, eg: https://{yourworkspacename}.cognitiveservices.azure.com/
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
# Authenticate the client using your key and endpoint
def authenticate_client():
ta_credential = AzureKeyCredential(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint,
credential=ta_credential)
return text_analytics_client
client = authenticate_client()
# Example method for detecting the language of text
def language_detection_example(client):
try:
documents = ["日本国(にほんこく、にっぽんこく、英: Japan)、または日本(にほん、にっぽん)は、東アジアに位置する民主制国家 [1]。首都は東京都[注 2][2][3]。"]
response = client.detect_language(documents = documents, country_hint = 'us')[0]
print("Language: ", response.primary_language.name)
except Exception as err:
print("Encountered exception. {}".format(err))
language_detection_example(client)
I have opened a ticket to Cognitive Service Language Detection team and will keep you updated.
Hi @jingwora , Cognitive Service team has acknowledged the issue and they will update the model again soon to address these regressions. For now, you can switch to an older version of the model.
We found an issue setting model version with SynapseML, and @serena-ruan helped to make change https://github.com/microsoft/SynapseML/pull/1756
For now, you can use the latest build with the fix: com.microsoft.azure:synapseml_2.12:0.10.2-14-b205cc47-SNAPSHOT and set the model version with
import synapse.ml
from synapse.ml.cognitive import *
from pyspark.sql.functions import col
# Set key
key = '' # API key
location = 'japaneast' # Location
language = (LanguageDetector()
.setSubscriptionKey(key)
.setLocation(location)
.setModelVersion("2021-11-20") #previous version
.setTextCol("text")
.setOutputCol("language")
.setErrorCol("error"))
# Test Text Analytics
test_data = spark.createDataFrame([(1, 'Japan'),
(2, '日本国'),
(3, 'にほんこく'),
(4, '日本国(にほんこく、にっぽんこく'),
(5, '日本国(にほんこく、にっぽんこく、英: Japan)、または日本(にほん、にっぽん)は、東アジアに位置する民主制国家 [1]。首都は東京都[注 2][2][3]。'),
], ["id", "text"])
# display(test_data)
test_data2 = language.transform(test_data)
display(test_data2)
If you want to test directly with Cognitive Service:
key = '' #cognitive service key
endpoint = "" #cognitive service endpoint, eg: https://{yourworkspacename}.cognitiveservices.azure.com/
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
# Authenticate the client using your key and endpoint
def authenticate_client():
ta_credential = AzureKeyCredential(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint,
credential=ta_credential)
return text_analytics_client
client = authenticate_client()
# Example method for detecting the language of text
def language_detection_example(client):
try:
documents = ["日本国(にほんこく、にっぽんこく、英: Japan)、または日本(にほん、にっぽん)は、東アジアに位置する民主制国家 [1]。首都は東京都[注 2][2][3]。"]
response = client.detect_language(documents = documents, country_hint = 'us', model_version='2021-11-20')[0]
print("Language: ", response.primary_language.name)
except Exception as err:
print("Encountered exception. {}".format(err))
language_detection_example(client)
I'll let you know after cognitive service team update the model.
Hi @JessicaXYWang
Thank you for your fast response. Is there any solution without changing to the snapshot built? com.microsoft.azure:synapseml_2.12:0.10.2-14-b205cc47-SNAPSHOT
Hi @jingwora It can be automatically fixed when Cognitive Service team release a new version of language detection model.
But if you want to manually set a previous version to fix this issue now, the previous build won't work.
Hi @JessicaXYWang Thank you for your clearification.
Thanks @JessicaXYWang for doing this repro, did the cog service yield any errors if so this should show up in the error column of the transformer @jingwora do you see anything in the error column? If not that should be fixed on our side
@mhamilton723 Thanks for your help! There is no error in error column. Language column show unknown. There are a couple of experiments
Thanks @JessicaXYWang for doing this repro, did the cog service yield any errors if so this should show up in the error column of the transformer @jingwora do you see anything in the error column? If not that should be fixed on our side
Hi @mhamilton723 , cog service does not yield errors. According to Cognitive Service Language Detection documentation, The response for languages that cannot be detected is unknown.
SynapseML version
com.microsoft.azure:synapseml_2.12:0.9.5
System information
Describe the problem
From my reproduction code, LanguageDetector can detect only very short words. (Return unknown)
Test results: 日本国 -> Japanese にほんこく - > Japanese 日本国(にほんこく、にっぽんこく - > Unknown 日本国(にほんこく、にっぽんこく、英: Japan)、または日本(にほん、にっぽん)は、東アジアに位置する民主制国家 [1]。首都は東京都[注 2][2][3 - > Unknown
I used to use this code and it can detect langugae with a long paragraph. (No change of environment version at all)
This bug occur few days ago.
Could you check what happen? And how can I solve this issue?
Code to reproduce issue
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrations