intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.71k stars 1.26k forks source link

PySpark support for BigDL LLM int4 #8905

Open olcayc opened 1 year ago

olcayc commented 1 year ago

Hi, I would like to evaluate the following capabilities of BigDL LLM using PySpark offline CPU jobs:

Please advise on best way to access these capabilities from a pyspark cpu job.

@hkvision

sgwhat commented 1 year ago

Hi Olcayc,

BigDL LLM can fully meet your requirements for the following capabilities.

We have successfully completed an example of generating embeddings using PySpark with the Hugging Face transformers INT4 optimized model.

Here is a temporary simple example, you may follow the example below to generate embeddings initially. We are currently testing this example on the cluster🙂, and the results will be delivered soon, please stay tuned.

from pyspark.sql import SparkSession

from bigdl.llm.langchain.embeddings import TransformersEmbeddings
from bigdl.orca import init_orca_context, stop_orca_context

sc = init_orca_context( )
spark = SparkSession(sc)

bigdl_embedder = TransformersEmbeddings(model_path="/path/to/llama-hf")

input = [("1", "This is a positive review."),
         ("2", "The product is not good."),
         ("3", "I highly recommend it.")]

df = spark.createDataFrame(input, ["id", "comment"])

def encode_iter(partition):
    for row in partition:
        text = row["comment"]
        query_result = bigdl_embedder.embed_query(text)
        doc_result = bigdl_embedder.embed_documents([text])
        yield Row(id=row["id"], comment=text, query_embed=query_result, doc_embed=doc_result[0])

df_embedding = df.rdd.mapPartitions(encode_iter).toDF()
df_embedding.show()
sgwhat commented 1 year ago

After our testing, it has been demonstrated that this example can efficiently implement LLM embeddings on a cluster using PySpark.