Question about dataset - Githubissues

MINJIK01 commented 7 months ago

Hello, first of all, thank you for your interesting project. I am simply wondering if there are only training datasets linked in your GitHub repository.

mistyreed63849 commented 7 months ago

Hi @MINJIK01,

Great thanks for your interst in our project.

The google drive link provided in the repo (https://drive.google.com/file/d/1fRXdCMHpkb1-kuzcxgZPKkILEWBSbW4M) contains training/validation/test set for each of the 4 datasets. Note that train/val/test set graphs are stored in same files and the split lies in the file {dataset_name}/{dataset_name}_split.pkl. The split is done automatically in the train_graph_llm.py as follows:

# Line 48
dataset, split, edge_index = load_dataset[args.dataset]()

...

# Line 87-90
train_dataset = clm_dataset_train.select(split['train'])
val_dataset = clm_dataset_train.select(split['valid'])
val_dataset_eval = clm_dataset_test.select(split['valid'])
test_dataset = clm_dataset_test.select(split['test'])

I hope this response can address your question and please feel free to ask if you have any further questions.

MINJIK01 commented 7 months ago

onse can address your question and please feel free to ask if you have a

Thanks a lot :) I understood.

mistyreed63849 / Graph-LLM

Question about dataset #4