intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.71k stars 1.26k forks source link

Refine the Spark Wrapper (DLClassifier and DLEstimator) #941

Closed hhbyyh closed 7 years ago

hhbyyh commented 7 years ago

Several issues were found during the review of https://github.com/intel-analytics/BigDL/pull/745, since most of them would impact DLClassifier, we didn't make the change in https://github.com/intel-analytics/BigDL/pull/745.

1.DLClassifier uses HasInputCol and HasOutputCol while DLEstimator uses HasFeatureCol and HasPredictionCol, suggest to unify to HasFeatureCol and HasPredictionCol.

  1. We should implement TransformSchema to allow datatype check before pipeline execution and better error message.
  2. Currently DLClassifier and DLEstimator implement DLDataParams separately which does not make sense.
  3. DLClassifier and DLEstimator may not necessarily be generic, they can inference the datatype from model.
  4. We can make process method private as I assume it may be revoked sometime in the future if we don't need to support 1.5+.
  5. Shall we rename the DLClassifier to DLTransformer? As Classification is only one type of the applications.

We can also add the Evaluator interface.

hhbyyh commented 7 years ago

Mark High priority since it's API change and probably should be fixed before release. Feel free to downgrade it if necessary. I can send a quick fix. Appreciate your suggestions. cc @jason-dai @yiheng

wzhongyuan commented 7 years ago

sure, I will make a quick fix to DLClassifier. I thought we didn't want to touch the existing DLClassifier. Another thing we may need to consider is we may want to put modelTrain and batchShape into the constructor as well to keep consistent with DLEstimator @hhbyyh