datasets - Githubissues

nkrnrnk / BertPunc

SOTA punctation restoration (for e.g. automatic speech recognition) deep learning model based on BERT pre-trained model

Apache License 2.0

180 stars 43 forks source link

datasets #4

Open mattiaguerri opened 4 years ago

mattiaguerri commented 4 years ago

Could you please upload an example of the datasets you load in train.py, lines 190-192?

NavKumarGit commented 3 years ago

Data which was used for training earlier ISWL data i.e. train2012 and test2011 all these files have data as keywords but the end goal is to punctuate the complete full text not just few words. Can you please make me understand like of instead of using full length text as input to train and test, why just one word or 2 words are used?

vjkadekar commented 3 years ago

Could you please upload an example of the datasets you load in train.py, lines 190-192?

Same situation here. Can we get the data format you had to train and test? Its difficult to understand the code with no data attached.

JeremySun1224 commented 3 years ago

@vjkadekar Maybe use this datasets: https://github.com/IsaacChanghau/neural_sequence_labeling/tree/master/data/raw/LREC