nateraw / modelcards

📝 Utility to create, edit, and publish model cards on the Hugging Face Hub. [**Now lives in huggingface_hub**]
MIT License
15 stars 4 forks source link

yaml error when saving the card #58

Closed merveenoyan closed 2 years ago

merveenoyan commented 2 years ago

I get:

yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:numpy.core.multiarray.scalar'
  in "<unicode string>", line 16, column 14:
          value: !!python/object/apply:numpy.core ... 

when I want to save the card, I don't know what I'm doing wrong (given in the tests I'm doing something similar and they pass) maybe I'm hitting an edge case. I tried with pyyaml 6.0 and 5.4 as they were allowed.

Here's code to reproduce the issue:

from skops import card
from modelcards import CardData
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.model_selection import HalvingGridSearchCV, train_test_split
from skops import card
X, y = load_breast_cancer(as_frame=True, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)
param_grid = {
    "max_leaf_nodes": [5, 10, 15],
    "max_depth": [2, 5, 10],
}
model = HalvingGridSearchCV(
    estimator=HistGradientBoostingClassifier(),
    param_grid=param_grid,
    random_state=42,
    n_jobs=-1,
).fit(X_train, y_train)
model.score(X_test, y_test)
limitations = "This model is not ready to be used in production."
model_description = (
    "This is a HistGradientBoostingClassifier model trained on breast cancer dataset."
    " It's trained with Halving Grid Search Cross Validation, with parameter grids on"
    " max_leaf_nodes and max_depth."
)
license = "mit"
eval_results = card.evaluate(
    model, X_test, y_test, "neg_mean_squared_error", "random_type", "dummy_dataset", "tabular-regression"
)
card_data = CardData(
    license=license,
    tags=["tabular-classification"],
    datasets="breast-cancer",
    eval_results=eval_results,
    model_name="my-cool-model",
)
permutation_importances = card.permutation_importances(model, X_test, y_test)
model_card = card.create_model_card(
    model,
    card_data=card_data,
    template_path = "skops/skops/card/default_template.md",
    limitations=limitations,
    model_description=model_description,
    permutation_importances=permutation_importances,
)
model_card.save(f"{local_repo}/README.md")

Below is the error I get:

python3 test.py
Traceback (most recent call last):
  File "test.py", line 100, in <module>
    model_description=model_description,
  File "/Users/mervenoyan/Desktop/skops/skops/skops/card/_model_card.py", line 57, in create_model_card
    **card_kwargs,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/modelcards/cards.py", line 274, in from_template
    return cls(content)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/modelcards/cards.py", line 40, in __init__
    data_dict = yaml.safe_load(yaml_block)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/__init__.py", line 162, in safe_load
    return load(stream, SafeLoader)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 43, in get_single_data
    return self.construct_document(node)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 52, in construct_document
    for dummy in generator:
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 404, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 210, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 135, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 92, in construct_object
    data = constructor(self, node)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/yaml/constructor.py", line 420, in construct_undefined
    node.start_mark)
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object/apply:numpy.core.multiarray.scalar'
  in "<unicode string>", line 16, column 14:
          value: !!python/object/apply:numpy.core ... 

I also tried to save without args after CardData to see if they have weird characters, I still get the same error. You can try my fork's feature_importance branch to get the necessary functions.

nateraw commented 2 years ago

You passing a np array as a jinja template variable? That should be a str...jinja won't know what to do with it directly

merveenoyan commented 2 years ago

@nateraw I also tried with creating the card with only model and CardData object, in which it creates hyperparameter table (string) and model plot (HTML) and in CardData, there's list of EvalResults object, and bunch of strings. I'll debug better tomorrow to see if there's anything else that might be causing it, but there's nothing numpy to my knowledge. I might be hitting an edge case or it might be my stupidity, I'll see.

merveenoyan commented 2 years ago

It was definitely my fault not seeing that the metric was np.float(). Sorry!