Open akshat-suwalka-dream11 opened 1 year ago
Hey @akshat-suwalka-dream11 :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
@mhamilton723
I'll investigate.
@akshat-suwalka-dream11 , can you modify the plot_dependence_for_numeric
function to this and see if it works:
def plot_dependence_for_numeric(df, col, col_int=True, figsize=(20, 5)):
dict_values = {}
col_names = list(df.columns)
for col_name in col_names:
dict_values[col_name] = df[col_name][0].toArray()[0]
marklist = sorted(
dict_values.items(), key=lambda x: int(float(x[0])) if col_int else x[0]
)
sortdict = dict(marklist)
fig = plt.figure(figsize=figsize)
plt.plot(list(sortdict.keys()), list(sortdict.values()))
plt.xlabel(col, size=13)
plt.ylabel("Dependence")
plt.ylim(0.0)
plt.show()
@memoryz Thank you for the reply... It is solving this problem which is to plot. But i am seeing for every column and their every bucket there is only one constant value is coming ....like in above 0.34720682012802506. One may say ok this feature is not important thats why it is showing constant value but i saw that for every feature it is the same value...Now this is a problemastic
@akshat-suwalka-dream11 can you attach a screenshot of what you're seeing? I'm not sure if I understand what the problem is.
@memoryz
for every single column have this type of data
@memoryz
SynapseML version
Version: 0.11.0
System information
10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12) com.microsoft.azure:synapseml_2.12:0.10.1 pyspark in databricks
Describe the problem
In my randomforesstclassification model, which is pyspark model....all the features are numerical..
the ouptut
Code to reproduce issue
pdp_1 = ICETransformer( model=model_object_1, targetCol="probability", kind="average", targetClasses=[1], numericFeatures=[{"name": "pd1_amount_join", "numSplits": 50, "rangeMin": 0.0, "rangeMax": 400000.0}]
convert -290 to -1
)
output_pdp_1 = pdp_1.transform(features_1.filter(features_1.days_inactive == 0)) display(output_pdp_1)
Below is the code which is showing error
df_userid_1 = get_pandas_df_from_column(output_pdp_1, "pd1_amount_join_dependence") plot_dependence_for_numeric(df_userid_1, "pd1_amount_join")
Other info / logs
1st display result -> {"264000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "0.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "400000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "80000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "336000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "56000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "32000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "384000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "24000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "152000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "72000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "248000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "160000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "176000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "200000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "296000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "368000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "376000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "168000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "64000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "184000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "240000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "88000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "360000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "320000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "256000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "352000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "136000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "8000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "312000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "16000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "192000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "216000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "232000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "272000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "104000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "392000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "224000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "128000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "288000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "344000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "208000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "40000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "96000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "280000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "112000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "48000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "144000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "304000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "328000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}, "120000.0": {"vectorType": "dense", "length": 1, "values": [0.34720682012802506]}}
2nd error ->
/databricks/spark/python/pyspark/sql/pandas/conversion.py:92: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below: Unable to convert the field 104000.0. If this column is not necessary, you may consider dropping it or converting to primitive type before the conversion. Direct cause: Unsupported type in conversion to Arrow: VectorUDT Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true. warnings.warn(msg) ValueError: invalid literal for int() with base 10: '104000.0'
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrations