Searching the best items to recommend and the paths for the recommended item on the knowledge graph based on reinforcement learning.
beam search-based algorithm: enable to sample diverse reasoning paths and candidate item sets for recommendation efficiently. <- how much effective?
Others
Agent: Discover suitable items and paths to the items. It starts from a user and conducts explicit multi-step path reasoning over the graph.
Environment: knowledge graph
Rewards: (only already bought items are included in KG as interactions and there is no pre-known targeted item, so it is unfeasible to consider binary rewards indication whether the agent has reached a target or not.) -> rewards are computed by an another (multi-hop) scoring function that generate soft rewards (0: if the current entity is not items, f(.) {dot<user embed, target item embed>} / {scaling: max(dot<user embedd + puchased item embed, items>) }: if the current entity is items.).
Information
Link
https://arxiv.org/pdf/1906.05237.pdf
Overview
Others
Reference (for understanding)