I modified previous notebooks and files to work with the new graph2text sequences. However, this time, I think it is a good idea to prepare the sequences' data first, and then train our model with this already prepared data. That way, we don't have to re-prepare every time we train (also easier and more valid in terms of reproducing results). With that being said, the added files and its purpose are the following:
data_prep.py: firstly prepare the sequences data and upload to HuggingFace. I've already prepared the for T5-xl-ssm.
sequences.ipynb: renamed notebook "subgraph_classification_reranking.ipynb" to "sequences.ipynb" since it makes more sense. This notebook contains the data prepping from data_prep.py and also the training/re-ranking. This notebook is used for sanity check, in case you don't want to go through the scripts.
train_ranking_model.py: modified so it works with the pre-uploaded data and its new format. Also removed the context part as it's not needed (we always cat the question to the sequence now. However, if needed, it's just a few extra lines of code)
I modified previous notebooks and files to work with the new
graph2text
sequences. However, this time, I think it is a good idea to prepare the sequences' data first, and then train our model with this already prepared data. That way, we don't have to re-prepare every time we train (also easier and more valid in terms of reproducing results). With that being said, the added files and its purpose are the following:data_prep.py
: firstly prepare the sequences data and upload to HuggingFace. I've already prepared the for T5-xl-ssm.sequences.ipynb
: renamed notebook "subgraph_classification_reranking.ipynb" to "sequences.ipynb" since it makes more sense. This notebook contains the data prepping fromdata_prep.py
and also the training/re-ranking. This notebook is used for sanity check, in case you don't want to go through the scripts.train_ranking_model.py
: modified so it works with the pre-uploaded data and its new format. Also removed the context part as it's not needed (we always cat the question to the sequence now. However, if needed, it's just a few extra lines of code)