Open beyondguo opened 3 years ago
Update:
I found that using bert-base-multilingual-uncased
will be fine:
text = '咋就不行了?'
context_aug = naw.ContextualWordEmbsAug(
model_path='bert-base-multilingual-uncased', action="substitute")
augmented_text = context_aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)
>>>>>>>>
Original:
咋就不行了?
Augmented Text:
咋 就 通 行 了 ?
So what's the problem with hfl/chinese-roberta-wwm-ext
?
let use "bert-base-multilingual-uncased" as a workaround. will look into "hfl/chinese-roberta-wwm-ext" model.
Even though they are roberta models, they have to be loaded using bert type. model_type='bert' as a param will fix the issue.
Using Context-word-enbedding augmenter:
English:
output is ok:
But for Chinese:
output is exactly the same with my input:
I have checked to source code and don't know what's going wrong, could you please help me? Thanks a lot!!!