neolifer / LLM4POI

Apache License 2.0
22 stars 5 forks source link

Large Language Models for Next Point-of-Interest Recommendation

License: APACHE-2.0 Venue:SIGIR 2024

This repository includes the implementation of paper "Large Language Models for Next Point-of-Interest Recommendation".

Install

  1. Clone this repository to your local machine.
  2. Install the enviroment by running
    conda env create -f environment.yml

    Alternatively, you can download the conda environment in linux directly with this google drive link. Then try:

mkdir -p llm4poi
tar -xzf "venv.tar.gz" -C "llm4poi"
conda activate llm4poi
  1. Download the model from (https://huggingface.co/Yukang/Llama-2-7b-longlora-32k-ft)

    Dataset

    Download the datasets raw data from datasets.

    • Unzip datasets.zip to ./datasets
    • Unzip datasets/nyc/raw.zip to datasets/nyc.
    • Unzip datasets/tky/raw.zip to datasets/tky.
    • Unzip datasets/ca/raw.zip to datasets/ca.
    • run python preprocesssing/generate_ca_raw.py --dataset_name {dataset_name}

Preprocess

run python preprocessing/run.py

run python preprocessing/traj_qk.py

run python traj_sim --dataset_name {dataset_name} --model_path {your_model_path}

run python preprocessing/to_nextpoi_qkt.py --dataset_name {dataset_name}

Main Performance

train

run

torchrun --nproc_per_node=8 supervised-fine-tune-qlora.py  \
--model_name_or_path {your_model_path} \
--bf16 True \
--output_dir {your_output_path}\
--model_max_length 32768 \
--use_flash_attn True \
--data_path datasets/processed/{DATASET_NAME}/train_qa_pairs_kqt.json \
--low_rank_training True \
--num_train_epochs 3  \
--per_device_train_batch_size 1     \
--per_device_eval_batch_size 2     \
--gradient_accumulation_steps 1     \
--evaluation_strategy "no"     \
--save_strategy "steps"     \
--save_steps 1000     \
--save_total_limit 2     \
--learning_rate 2e-5     \
--weight_decay 0.0     \
--warmup_steps 20     \
--lr_scheduler_type "constant_with_warmup"     \
--logging_steps 1     \
--deepspeed "ds_configs/stage2.json" \
--tf32 True

test

run

python eval_next_poi.py --model_path {your_model_path}--dataset_name {DATASET_NAME} --output_dir {your_finetuned_model} --test_file "test_qa_pairs_kqt.txt"

Acknowledgement

This code is developed based on STHGCN and LongLoRA.

Citation

If you find our work useful, please consider cite our paper with following:

@inproceedings{li-2024-large,
author = {Li, Peibo and de Rijke, Maarten and Xue, Hao and Ao, Shuang and Song, Yang and Salim, Flora D.},
booktitle = {SIGIR 2024: 47th international ACM SIGIR Conference on Research and Development in Information Retrieval},
date-added = {2024-03-26 23:47:40 +0000},
date-modified = {2024-03-26 23:48:47 +0000},
month = {July},
publisher = {ACM},
title = {Large Language Models for Next Point-of-Interest Recommendation},
year = {2024}}