intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.67k stars 1.26k forks source link

Multi-inputs example fails #7797

Closed Ishitori closed 1 year ago

Ishitori commented 1 year ago

I am trying to write a simple multi inputs example based on the example from the doc, but using only core features of BigDL. Here is the code:

from bigdl.dllib.nn.layer import *
from bigdl.dllib.nn.criterion import *
from bigdl.dllib.utils.common import *
from bigdl.dllib.nnframes.nn_classifier import *
from bigdl.dllib.feature.common import *
from bigdl.dllib.keras import layers as ZLayer
from bigdl.dllib.keras.models import Model as ZModel
from bigdl.dllib.keras.objectives import BinaryCrossEntropy as ZBinaryCrossEntropy
from bigdl.dllib.keras.optimizers import Adam

from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler

init_engine()

df = spark.createDataFrame(
    [(1, 35, 109.0, Vectors.dense([2.0, 5.0, 0.5, 0.5]), 1.0),
     (2, 58, 2998.0, Vectors.dense([4.0, 10.0, 0.5, 0.5]), 2.0),
     (3, 18, 123.0, Vectors.dense([3.0, 15.0, 0.5, 0.5]), 1.0)],
    ["user", "age", "income", "history", "label"])

assembler = VectorAssembler(
    inputCols=["user", "age"],
    outputCol="features")

df = assembler.transform(df).cache()

x1 = ZLayer.Input(shape=(1,))
x2 = ZLayer.Input(shape=(1,))

user_embedding = ZLayer.Embedding(5, 10)(x1)
flatten = ZLayer.Flatten()(user_embedding)

dense1 = ZLayer.Dense(10)(x2)

merged = ZLayer.merge([flatten, dense1], mode="concat")
zy = ZLayer.Dense(2)(merged)

zmodel = ZModel([x1, x2], zy)
criterion = ZBinaryCrossEntropy()

classifier = NNEstimator(zmodel, criterion, [[1], [1]]) \
    .setOptimMethod(Adam()) \
    .setLearningRate(0.1)\
    .setBatchSize(2) \
    .setMaxEpoch(10)

nnClassifierModel = classifier.fit(df)

Unfortunately, it fails with the following exception:

An error was encountered:
There're 1 inputs, but graph has 2 roots
Traceback (most recent call last):
  File "/mnt/yarn/usercache/livy/appcache/application_1663192032239_10962/container_1663192032239_10962_01_000001/pyspark.zip/pyspark/ml/base.py", line 161, in fit
    return self._fit(dataset)
  File "/mnt/yarn/usercache/livy/appcache/application_1663192032239_10962/container_1663192032239_10962_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 335, in _fit
    java_model = self._fit_java(dataset)
  File "/mnt/yarn/usercache/livy/appcache/application_1663192032239_10962/container_1663192032239_10962_01_000001/pyspark.zip/pyspark/ml/wrapper.py", line 332, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/mnt/yarn/usercache/livy/appcache/application_1663192032239_10962/container_1663192032239_10962_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/mnt/yarn/usercache/livy/appcache/application_1663192032239_10962/container_1663192032239_10962_01_000001/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
    raise converted from None
pyspark.sql.utils.IllegalArgumentException: There're 1 inputs, but graph has 2 roots

It feels to me that input preprocessor fails, and instead of generating 2 inputs (one for each root) it generates only one. Or am I missing something?

Thank you.

qiuxin2012 commented 1 year ago

Three problems of your code:

  1. The convertion to mkldnn's IR graph is not correct. You can remove the bigdl.engineType=mlkdnn first.
  2. BinaryCrossEntropy's input is incorrect, you can change zy = ZLayer.Dense(2)(merged) to zy = ZLayer.Dense(1)(merged)
  3. The training loss is NaN

I will look into them.

qiuxin2012 commented 1 year ago

I have update https://github.com/intel-analytics/BigDL/blob/main/docs/readthedocs/source/doc/DLlib/Overview/nnframes.md, the related wiki page will be updated tomorrow.

Please notice BinaryCrossEntropy's label is 0 or 1, the last layer is Dense(1) SparseCategoricalCrossEntropy is for multi label 0 until n, the last layer is Dense(n).

Ishitori commented 1 year ago
  1. Got it. That means I cannot use MKLDnn with multi-inputs? Or I still can if I don't use Keras? Is there is a way to use multi-inputs with MKLDnn at this point?
  2. Got it.
  3. Thanks!

Could you explain me how feature_preprocessing works? As far as I understand array like [[1], [2]] means "Use feature with index 1 as the first input, and feature with inidices 2 and 3 as the second input" (so each number is the length of the features per input, and they are linearly ordered). But what [2, 2] in [[1], [2], [2, 2]] means? How does it get converted into LSTM input?

It still doesn't work for me though. This time it is: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 5 times, most recent failure: Lost task 0.4 in stage 15.0 (TID 5038) (ip-10-103-48-182.ec2.internal executor 7): java.lang.ArithmeticException: / by zero

qiuxin2012 commented 1 year ago

[2, 2] means convert a 2 * 2 features(indices 4,5,6,7) to a Tensor of size [2, 2].
[[1], [2], [2, 2]] correspond to the model's inputs

x1 = Input(shape=(1,))
x2 = Input(shape=(2,))
x3 = Input(shape=(2, 2,))

x3 need an input tensor of size [2, 2].

java.lang.ArithmeticException: / by zero: is this expection thrown from ClassNllCriterion? It looks like your label is not match. If you are using BinaryCrossEntropy, your should change the labels from 1 or 2 to 0 or 1.

df = spark.createDataFrame(
    [(1, 35, 109.0, Vectors.dense([2.0, 5.0, 0.5, 0.5]), 1.0),
     (2, 58, 2998.0, Vectors.dense([4.0, 10.0, 0.5, 0.5]), 2.0),
     (3, 18, 123.0, Vectors.dense([3.0, 15.0, 0.5, 0.5]), 1.0)],
    ["user", "age", "income", "history", "label"])
Ishitori commented 1 year ago

Yes, it works! It is a bit confusing with the difference between different criterions that some expects a label to be 0 and 1 while others expect labels to be 1 and 2. But it is okay.

So, the last question I have about this is it possible to use multi-inputs with MKLDnn? I assume that it will speed up training by a lot!

Thank you.

qiuxin2012 commented 1 year ago

Yes, it works! It is a bit confusing with the difference between different criterions that some expects a label to be 0 and 1 while others expect labels to be 1 and 2. But it is okay.

So, the last question I have about this is it possible to use multi-inputs with MKLDnn? I assume that it will speed up training by a lot!

Thank you.

It's possible to use MKLDnn, but we just provide limited support for MKLDnn now. Dllib's MKLDNN only support a few modules like Linear(Dense), Maxpooling, Convolution and BatchNormalization, etc. https://github.com/intel-analytics/BigDL/tree/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/nn/mkldnn If all the modules in your model are supported, MKLDNN will get a good performance. If some modules are not supported, the performance may be worse, the model has to waste a lot of time by converting the internal data between MKLDnn type and normal type.

Ishitori commented 1 year ago

Got it, thank you!