openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
668 stars 91 forks source link

Flow upload on test server broken #1214

Open tpham93 opened 5 months ago

tpham93 commented 5 months ago

Description

Hi guys,

I am encountering an issue where it seems like uploading flows is failing on the test server and creates an incomplete flow without any parameters, such that executing it again fails due to a mismatch in parameters. I am unsure if this belongs in this repository or the open-python repository, but from what I can see, it seems to be a server-side error. The error I encounter can be reproduced on 2 of my PCs with both a clean Python environment with an updated openml-python installation and an older environment with an older openml-python version that used to work.

Steps/Code to Reproduce

Here is a minimal example. You might have to try out another classifier to see the first error message in the next block.


import openml
openml.config.start_using_configuration_for_example()
from sklearn.tree import DecisionTreeClassifier

extension = openml.extensions.sklearn.SklearnExtension()
flow = extension.model_to_flow(DecisionTreeClassifier())
flow.publish()

Expected Results

The flow should be successfully uploaded to the test server, and the local flow should be assigned the server-side flow ID.

Actual Results

The first execution results in the following log:

WARNING:root:Received uncompressed content from OpenML for https://test.openml.org/api/v1/xml/flow/.
WARNING:root:Received uncompressed content from OpenML for https://test.openml.org/api/v1/xml/flow/.
---------------------------------------------------------------------------
ExpatError                                Traceback (most recent call last)
Cell In [16], line 3
      1 extension = openml.extensions.sklearn.SklearnExtension()
      2 flow = extension.model_to_flow(DecisionTreeClassifier())
----> 3 flow.publish()

File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/openml/flows/flow.py:445, in OpenMLFlow.publish(self, raise_error_if_exists)
    441 if self.flow_id:
    442     raise openml.exceptions.PyOpenMLError(
    443         "Flow does not exist on the server, " "but 'flow.flow_id' is not None.",
    444     )
--> 445 super().publish()
    446 assert self.flow_id is not None  # for mypy
    447 flow_id = self.flow_id

File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/openml/base.py:140, in OpenMLBase.publish(self)
    134 call = f"{_get_rest_api_type_alias(self)}/"
    135 response_text = openml._api_calls._perform_api_call(
    136     call,
    137     "post",
    138     file_elements=file_elements,
    139 )
--> 140 xml_response = xmltodict.parse(response_text)
    142 self._parse_publish_response(xml_response)
    143 return self

File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/xmltodict.py:327, in parse(xml_input, encoding, expat, process_namespaces, namespace_separator, disable_entities, **kwargs)
    325     parser.ParseFile(xml_input)
    326 else:
--> 327     parser.Parse(xml_input, True)
    328 return handler.item

ExpatError: no element found: line 1, column 0

The second execution results in the following log:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/openml/flows/flow.py:459, in OpenMLFlow.publish(self, raise_error_if_exists)
    458 try:
--> 459     openml.flows.functions.assert_flows_equal(
    460         self,
    461         flow,
    462         flow.upload_date,
    463         ignore_parameter_values=True,
    464         ignore_custom_name_if_none=True,
    465     )
    466 except ValueError as e:

File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/openml/flows/functions.py:551, in assert_flows_equal(flow1, flow2, ignore_parameter_values_on_older_children, ignore_parameter_values, ignore_custom_name_if_none, check_description)
    550     if len(symmetric_difference) > 0:
--> 551         raise ValueError(
    552             "Flow %s: parameter set of flow "
    553             "differs from the parameters stored "
    554             "on the server." % flow1.name,
    555         )
    557 if ignore_parameter_values_on_older_children:

ValueError: Flow sklearn.tree._classes.DecisionTreeClassifier: parameter set of flow differs from the parameters stored on the server.

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In [2], line 3
      1 extension = openml.extensions.sklearn.SklearnExtension()
      2 flow = extension.model_to_flow(DecisionTreeClassifier())
----> 3 flow.publish()

File ~/anaconda3/envs/wsp/lib/python3.8/site-packages/openml/flows/flow.py:468, in OpenMLFlow.publish(self, raise_error_if_exists)
    466 except ValueError as e:
    467     message = e.args[0]
--> 468     raise ValueError(
    469         "The flow on the server is inconsistent with the local flow. "
    470         f"The server flow ID is {flow_id}. Please check manually and remove "
    471         f"the flow if necessary! Error is:\n'{message}'",
    472     ) from e
    473 return self

ValueError: The flow on the server is inconsistent with the local flow. The server flow ID is 40757. Please check manually and remove the flow if necessary! Error is:
'Flow sklearn.tree._classes.DecisionTreeClassifier: parameter set of flow differs from the parameters stored on the server.'

If you try to inspect the parameters of the server-side flow (https://test.openml.org/api/v1/flow/40757), you only get an empty OrderedDict, while the local one is filled correctly. Below are the corresponding code snippets and their output: code:

flow.parameters

log:

OrderedDict([('ccp_alpha', '0.0'),
             ('class_weight', 'null'),
             ('criterion', '"gini"'),
             ('max_depth', 'null'),
             ('max_features', 'null'),
             ('max_leaf_nodes', 'null'),
             ('min_impurity_decrease', '0.0'),
             ('min_samples_leaf', '1'),
             ('min_samples_split', '2'),
             ('min_weight_fraction_leaf', '0.0'),
             ('random_state', 'null'),
             ('splitter', '"best"')])

code:

flow_server = openml.flows.get_flow(40757)
flow_server.parameters

log:

OrderedDict()