intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.57k stars 1.25k forks source link

Friesian FeatureTable missing OPs #5055

Open cyita opened 3 years ago

cyita commented 3 years ago

Operations supported by NVTabular:

StatOperator

TODO

hkvision commented 3 years ago

From mlperf team: To avoid the all-to-all time for syncing embedding inputs, each worker needs to have the full data of a column (multiple columns) corresponding to its embedding(s). Basically, a collect operation to gather all the data of a column to a certain node.

yaxinchen666 commented 3 years ago

Operations for RecSys

names

[user1, user2, user3] [user1, user4] [user3, user5, user6]

If possible, assign larger integer for string that is less frequently appear.

Elena-Qiu commented 3 years ago

Operations for Wechat Challenge

hkvision commented 3 years ago

Dummy pipeline unsupported operations:

yizerozhuang commented 3 years ago

Operations for Booking Challenge

yizerozhuang commented 3 years ago

Operations for Booking Challenge

yizerozhuang commented 3 years ago

Operations for Booking Challenge

df1.show()

+---+---+ | x | y | +---+---+ | 1 | a |
| 2 | b | | 3 | a | | 4 | a | | 5 | c | +---+---+ df2 = df1.factorise("y","z") +---+---+---+ | x | y | z | +---+---+---+ | 1 | a | 0 | | 2 | b | 1 | | 3 | a | 0 | | 4 | a | 0 | | 5 | c | 2 | +---+---+---+

yizerozhuang commented 3 years ago

Operations for Booking Challenge

yizerozhuang commented 3 years ago

Operations for Booking Challenge

hkvision commented 3 years ago

Also operations to handle timestamp, including:

yizerozhuang commented 3 years ago

Also operations to handle timestamp, including:

  • [ ] f.from_unixtime
  • [ ] f.hour
  • [ ] f.minute
  • [ ] f.second

will do tmr

songhappy commented 3 years ago

Also operations to connect Pandas Dataframe, sort

jenniew commented 3 years ago

Need persist table to avoid iterative computation.