nkrnrnk / BertPunc

SOTA punctation restoration (for e.g. automatic speech recognition) deep learning model based on BERT pre-trained model
Apache License 2.0
180 stars 43 forks source link

Data Format? #5

Open AASHISHAG opened 4 years ago

AASHISHAG commented 4 years ago

@nkrnrnk : Could you please add the format for the input data?

JeremySun1224 commented 3 years ago

@AASHISHAG Maybe use this datasets: https://github.com/IsaacChanghau/neural_sequence_labeling/tree/master/data/raw/LREC

kotikkonstantin commented 3 years ago

@AASHISHAG @JeremySun1224 For me the next format works: https://github.com/kotikkonstantin/ru-autopunctuation#dataset-preparing