Open torivor opened 1 year ago
Sorry for the late reply.
For question #1: I write s script by myself to convert the original XML files into the dataset of the current format.
For question #2: No further preprocessing needed. It is already appropriate for model training.
Can you please provide the Tagging Notebook to annotate the custom dataset,So that it can be flexible to train our custom datasets, Thank you.
From what I understand based on the official paper, the approach used in this repository is trying to predict the following sequence of tags based on the input sentence:
The train.txt files on the data folder are used for training the model to classify such sequence. I also noticed that each line in the file consist of both (!) sentence sequence as well as (2) tag sequence which is separated by "####". Regarding this, I have several questions: