Closed LRY1994 closed 4 months ago
@@ means the token is subword. You could concat them via: replace('@@ ', '')
@@ means the token is subword. You could concat them via: replace('@@ ', '')
Can we perhaps add the post-processing statements for handling subwords to the pipelines for all languages? @LauraGPT
🐛 Bug
识别出来subword
茂名口音, gt : 好 啲 呢 我 觉 得 pred: ho@@ al@@ ding ne@@ un@@ qu@@ ar@@ ter a
2-28-2_00751262_00752898.zip
To Reproduce