docker build . -t sem2vec
We have already prepared the preprocessed data in the codebase (see data/constraints.txt
, data/pair
, FoBERT/merges.txt
and FoBERT/vocab.json
)
To use your own data, please use the following steps.
python data/preprocess.py raw_constraints.txt constraints.txt
.We pretrain and fine-tune the model on NVIDIA 3090. It may encounter out-of-memory problems if the GPU memory is not large enough.
python src/run_roberta.py
python src/fine_tune.py
We show how to use the pretrained model to predict the masked token in line 50-57 of src/run_roberta.py
and use the fine-tuned model to generate the embedding of constraints in line 54-58 of src/fine_tune.py