intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.28k stars 1.23k forks source link

Use spark backend got error:NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using `save_weights`. #5312

Open Alxe1 opened 1 year ago

Alxe1 commented 1 year ago

Use spark backend got error:NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

In SparkRunner, it save the model as h5 file, it caused this error!, and my code should save as tf format. how can I deal with it?

jason-dai commented 1 year ago

@sgwhat please take a look

sgwhat commented 1 year ago

@Alxe1 Hey sorry for the late reply, would you mind providing the code you build the model (model_creator) and save it (est.save())?

Alxe1 commented 1 year ago

@sgwhat Code:

def model_creator(config):
    deep_cross = DeepCross(user_num=config["uid_num"],
                           item_num=config["item_num"],
                           user_item_dim=16,
                           sparse_num=config["sparse_num"],
                           feature_embed_dim=16,
                           embed_norm=0.001,
                           dnn_hidden_units=[int(e) for e in [128, 64, 32]],
                           dnn_activation="relu",
                           dnn_dropout=0.2,
                           cross_num=4)
    loss = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    deep_cross.compile(optimizer=optimizer, loss=loss, metrics=[tf.keras.metrics.AUC()])
    return deep_cross

def train_test():
    from bigdl.orca.learn.tf2 import Estimator
    from bigdl.orca import init_orca_context
    from bigdl.orca import OrcaContext

    sc = init_orca_context(cluster_mode='local', cores=16, memory="10g", num_nodes=3)
    conf = SparkConf().setAppName("test")
    conf.set("spark.sql.execution.arrow.enabled", True)
    conf.set("spark.sql.execution.arrow.fallback.enabled", True)

    spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()

    MODEL_PATH = "/models/deepcross_model"

    data_transform = DataTransform(MODEL_PATH, spark)
    uid_num, vid_num, sparse_num, data_count, sdf = data_transform.process()

    config = {"uid_num": int(uid_num), "vid_num": int(vid_num), "sparse_num": int(sparse_num)}

    est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

    train_data, test_data = sdf.randomSplit([0.8, 0.2], 100)

    stats = est.fit(train_data,
                    epochs=20,
                    batch_size=512,
                    feature_cols=["embed"],
                    label_cols=["label"],
                    steps_per_epoch=data_count // 512)
    print("stats: {}".format(stats))

    # res = est.predict(data=train_data.select("embed"), feature_cols=["embed"])
    # print(f"=====================res: {res}")
    # print(res.rdd.take(5))

    # est.save("/mytest/deepcross")

    # stats = est.evaluate(sdf,
    #                      feature_cols=["embedded_vector"],
    #                      label_cols=["label"])
    # print("stats: {}".format(stats))
sgwhat commented 1 year ago

@sgwhat Code:

def model_creator(config):
    deep_cross = DeepCross(user_num=config["uid_num"],
                           item_num=config["item_num"],
                           user_item_dim=16,
                           sparse_num=config["sparse_num"],
                           feature_embed_dim=16,
                           embed_norm=0.001,
                           dnn_hidden_units=[int(e) for e in [128, 64, 32]],
                           dnn_activation="relu",
                           dnn_dropout=0.2,
                           cross_num=4)
    loss = tf.keras.losses.BinaryCrossentropy()
    optimizer = tf.keras.optimizers.Adam()
    deep_cross.compile(optimizer=optimizer, loss=loss, metrics=[tf.keras.metrics.AUC()])
    return deep_cross

def train_test():
    from bigdl.orca.learn.tf2 import Estimator
    from bigdl.orca import init_orca_context
    from bigdl.orca import OrcaContext

    sc = init_orca_context(cluster_mode='local', cores=16, memory="10g", num_nodes=3)
    conf = SparkConf().setAppName("test")
    conf.set("spark.sql.execution.arrow.enabled", True)
    conf.set("spark.sql.execution.arrow.fallback.enabled", True)

    spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()

    MODEL_PATH = "/models/deepcross_model"

    data_transform = DataTransform(MODEL_PATH, spark)
    uid_num, vid_num, sparse_num, data_count, sdf = data_transform.process()

    config = {"uid_num": int(uid_num), "vid_num": int(vid_num), "sparse_num": int(sparse_num)}

    est = Estimator.from_keras(model_creator=model_creator,
                               config=config,
                               backend="spark",
                               model_dir="hdfs://ip:port/ckpt")

    train_data, test_data = sdf.randomSplit([0.8, 0.2], 100)

    stats = est.fit(train_data,
                    epochs=20,
                    batch_size=512,
                    feature_cols=["embed"],
                    label_cols=["label"],
                    steps_per_epoch=data_count // 512)
    print("stats: {}".format(stats))

    # res = est.predict(data=train_data.select("embed"), feature_cols=["embed"])
    # print(f"=====================res: {res}")
    # print(res.rdd.take(5))

    # est.save("/mytest/deepcross")

    # stats = est.evaluate(sdf,
    #                      feature_cols=["embedded_vector"],
    #                      label_cols=["label"])
    # print("stats: {}".format(stats))

Thanks! We will try to reproduce it.

Alxe1 commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

sgwhat commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Alxe1 commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

sgwhat commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

Alxe1 commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

sgwhat commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

Alxe1 commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

sgwhat commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

jason-dai commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

Is this a limitation of TensorFlow itself?

sgwhat commented 1 year ago

the DeepCross model is a user defined subclassed model, in tf2, it cannot save as h5 file.

Does there any arguments in __init__ of DeepCross

Yes, and I implemented get_config method following official doc.

Does get_config solve the problem?

No, same error.

well, I could save a model with h5 format after implementing get_config method. (but this may not be a good solution, since tensorflow suggests us to save a subclass mode with saveModel format). Also, may I know your tensorflow and keras version?

tensorflow=2.3.0

I see. I just implement a layer class, not a model class, that's why I could save it with h5 format. FOr subclass model, tensorflow doesn't support to save as a h5 file, so it's better to use saveModel format instead. 😄

Is this a limitation of TensorFlow itself?

Yes, it is.