intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.71k stars 1.26k forks source link

Adam AUC is 0.499999 #2702

Open land1725 opened 5 years ago

land1725 commented 5 years ago

@qiuxin2012 Hello,I am sorry to submit a new issue. I found Adam method maybe something not right.

I have tried other method like SGD and Adadelta,the AUC is about 0.64, But when I use Adam .AUC is high probability 0.499999.I have also tried the learning rate to 0.01, 0.001 and 1e-4,but .For example I trained 5 times , it may be 2-3 times Adam AUC result is 0.499999

at the first epoch the loss is 20.16201. 18/12/06 19:12:31 INFO optim.DistriOptimizer$: [Epoch 1 1280/594924][Iteration 1][Wall Clock 0.403881218s] Trained 1280 records in 0.403881218 seconds. Throughput is 3169.2485 records/second. Loss is 20.16201.

at the end of epoch ,the Loss is 0.60207695 ,but the AUC is 0.4999973. 18/12/06 19:17:27 INFO optim.DistriOptimizer$: [Epoch 10 595200/594924][Iteration 4650][Wall Clock 294.955613903s] Trained 1280 records in 0.057638138 seconds. Throughput is 22207.518 records/second. Loss is 0.60207695. 18/12/06 19:17:27 INFO optim.DistriOptimizer$: [Epoch 10 595200/594924][Iteration 4650][Wall Clock 294.955613903s] Epoch finished. Wall clock time is 296390.8913 ms 18/12/06 19:17:27 INFO optim.DistriOptimizer$: [Epoch 10 595200/594924][Iteration 4650][Wall Clock 294.955613903s] Validate model... 18/12/06 19:17:28 INFO optim.DistriOptimizer$: [Epoch 10 595200/594924][Iteration 4650][Wall Clock 294.955613903s] AucScore is (Average score: 0.4999973, count: 255949)

the inputdata is like this : 185 lines ,7000000lines,I have StandardScaler the input data. (185,[0,1,2,3,4,5,6,7,8,9,10,12,13,14,15,16,18,21,23,26,34,35,45,77,78,79,80,81,82,83,84,85,86,87,100,109,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,154,157,168,171,173,175,176,177,178,179,180,181,182,183,184],[0.2214023924099741,5.672617077161699,0.7231821250093369,0.5802122150378988,3.4110414551944546,0.5054001622827107,0.7382372579967251,0.18441391532901527,0.3510271108090672,0.057184464010305046,0.2936667230035301,5.744493151736237,1.718198436919466,0.9996764224589016,4.338197366842643,1.1253726950021101,0.05004213284425888,0.6302486177783938,0.5736969776660313,4.486687460243378,0.8952312406292044,0.5726980144811648,0.6356243345455732,0.7962974372653338,2.516540136001545,1.1897144291852573,0.8814523813109089,2.660899540807231,0.9432194372957498,2.816869094640019,0.43827096300537066,1.4085802588452512,0.5809362501015424,0.9198992897398048,0.2730023489866407,0.2417857737202852,0.9715020930273759,3.9709002675727523,2.6946510698621893,-3.4113444351111952,-4.620155069020955,1.4316627636981325,-1.4413002593310593,-0.1663886120749784,-1.8829195920277206,1.3817444462308304,2.060667514159773,0.1124817677892372,0.17919385918466163,3.001698345304736,5.143964966464063,0.12607719440919968,0.0882325387245275,0.08110055281123292,0.06889164603902073,126.76289223448991,127.9805373979275,6.329776204424773,2.231923092374546,2.465422589633035,2.567296798122765,307.4773704563241,-0.11085255757047624,0.1953789802885409,0.10489344888621102,-0.19408594595433964,-0.13845577081669264,-0.05277336862310883,-0.021445803447883573,-0.05328150631279249,-0.14605048347519062,0.017330921643215553]

my code is

Engine.init

val hidden_1 = Dense(outputDim = 16, activation = "relu", inputShape = Shape(inputNumber))
val hidden_2 = Dense(outputDim = 8, activation = "relu")
val hidden_3 = Dense(outputDim = 1, activation = "sigmoid")

val model = Sequential()
  .add(hidden_1)
  .add(hidden_2)
  .add(hidden_3)

val criterion = BCECriterion()
val estimator = new DLClassifier(model, criterion, Array(inputNumber))
  .setFeaturesCol("scaledFeatures")
  .setLabelCol("label")
  .setBatchSize(1280)
  .setMaxEpoch(10)
  .setOptimMethod(new Adam(1e-4))
  .setValidation(Trigger.everyEpoch, testInput, Array(new AUC()), 1280)
val dlModel = estimator.fit(trainInput.coalesce(4))
val testRank = dlModel.transform(testInput.coalesce(4))
testRank.show(100, false)
qiuxin2012 commented 5 years ago

@land1725 According to your description, I can't assert where the problem is. I just check the Adam's code, and don't find anything wrong. Could you provide a unit test or something else to help me to reproduce this bug?

land1725 commented 5 years ago

@qiuxin2012 hello ,Qiu,I am so glad to help to solved the problem. Should I give you a Baidu Yun Pan address, I provided my test data about 200M . Then you can use val data = ss.sqlContext.read.parquet(modelOutputPath) to read in your spark cluster. and you can train the data use my code,is that OK?

val Array(trainInput, testInput) = data.randomSplit(Array(0.7, 0.3))

val featuresHead = trainInput.select("scaledFeatures").head
val inputNumber = featuresHead(0).asInstanceOf[SparseVector].size
println("inputNumber is " + inputNumber)

Engine.init

val hidden_1 = Dense(outputDim = 16, activation = "relu", inputShape = Shape(inputNumber))
val hidden_2 = Dense(outputDim = 8, activation = "relu")
val hidden_3 = Dense(outputDim = 1, activation = "sigmoid")

val model = Sequential()
  .add(hidden_1)
  .add(hidden_2)
  .add(hidden_3)

val criterion = BCECriterion()
val estimator = new DLClassifier(model, criterion, Array(inputNumber))
  .setFeaturesCol("scaledFeatures")
  .setLabelCol("label")
  .setBatchSize(1280)
  .setMaxEpoch(10)
  .setOptimMethod(new Adam(1e-4))
  .setValidation(Trigger.everyEpoch, testInput, Array(new AUC()), 1280)
val dlModel = estimator.fit(trainInput.coalesce(4))
val testRank = dlModel.transform(testInput.coalesce(4))
testRank.show(100, false)
qiuxin2012 commented 5 years ago

@land1725 Add my QQ 498028146, and send me offline. Thanks.

land1725 commented 5 years ago

@qiuxin2012 you are so kind, and I want to give you a big hug to help me to solve the problems. Dense() layer ,when I use it ,it use Xavier initMethod,And it can not to be used in Relu and Sigmoid Net. the right initMethod is He initialization,Could you make a request to implement He initialization?

qiuxin2012 commented 5 years ago

https://github.com/intel-analytics/BigDL/blob/ce7e573e82b2ca593213dd36ae60fa94d939d52a/spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/InitializationMethod.scala#L316 Is this He initialization? Specifically accounts for ReLU nonlinearities.