sem2vec / sem2vec-BERT

Code for embedding symbolic constraints
9 stars 2 forks source link

Prepare Environment

Preprocess Data

We have already prepared the preprocessed data in the codebase (see data/constraints.txt, data/pair, FoBERT/merges.txt and FoBERT/vocab.json)

To use your own data, please use the following steps.

Train Model

We pretrain and fine-tune the model on NVIDIA 3090. It may encounter out-of-memory problems if the GPU memory is not large enough.

Mask Prediction and Embedding Generation

We show how to use the pretrained model to predict the masked token in line 50-57 of src/run_roberta.py and use the fine-tuned model to generate the embedding of constraints in line 54-58 of src/fine_tune.py