kwonmha bert-vocab-builder issues - Githubissues

kwonmha / bert-vocab-builder

Builds wordpiece(subword) vocabulary compatible for Google Research's BERT

226 stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

I am getting an error while running vocab builder.

#16 mshivasharan opened 3 years ago
2
BERT trained on custom corpus

#15 anidiatm41 opened 4 years ago
1
splitting strategy in tokenize.py

#14 mandalbiswadip closed 4 years ago
3
Corpus preprocessing steps

#13 LydiaXiaohongLi opened 4 years ago
5
is there a format for corpus_filepattern?

#12 YuBeomGon closed 4 years ago
2
error in running ALBERT create_pretraining_data.py

#11 aravindchaluvadi closed 4 years ago
2
Windows fatal exception: access violation

#10 frank-lin-liu closed 4 years ago
0
Issue with tf.gfile / tf.io.gfile

#9 then4p closed 4 years ago
1
AttributeError: module 'tensorflow.io' has no attribute 'gfile'

#8 AmeeraMilibari closed 5 years ago
2
Projects using this and evaluation results

#7 NebelAI opened 5 years ago
2
Should I match the vocabulary size with bert_config.json

#6 AnakTeka closed 5 years ago
2
Not accurate sub-words for German

#5 maggieezzat opened 5 years ago
1
Removed merge conflict markers

#4 bhoomit closed 5 years ago
1
Merge conflict markers are still there…

#3 bhoomit closed 5 years ago
2
Unable to understand the input format and also the generated output

#2 ayushjain1144 closed 5 years ago
4
If I change the min_count flag in order to produce vocab of size same bert's original vocab: can I then use this new vocab to pretrain from a checkpoint, or I have to train from the scratch?

#1 maggieezzat closed 5 years ago
8