Closed hlthu closed 1 year ago
What I think you are really looking for is generating all possible partitions of a word. It is a common algorithmic problem with many possible solutions (e.g. see here)
you could use the solution suggested by @kopi22, and then filter out any results that contain substrings which are not valid subwords in your BPE vocabulary.
Got it. Thx a lot.
On Mon, Apr 24, 2023 at 9:05 PM Rico Sennrich @.***> wrote:
you could use the solution suggested by @kopi22 https://github.com/kopi22, and then filter out any results that contain substrings which are not valid subwords in your BPE vocabulary.
— Reply to this email directly, view it on GitHub https://github.com/rsennrich/subword-nmt/issues/119#issuecomment-1520125080, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADVAENDYXMMOISFFAMBGRBLXCZ3CNANCNFSM6AAAAAAXIKZKF4 . You are receiving this because you authored the thread.Message ID: @.***>
-- 黄 露 Lu Huang
For a word like "hello", how can I generate all valid BPEs for it, including: