Closed guokeda closed 5 years ago
Sorry for the late response.
I'm not sure what this "result" means, but assuming that it is *.vocab file, first column is the piece and the second column is the log probability. eq 6. in http://aclweb.org/anthology/P18-1007
However, the vocab file is not used in the actual segmentation. You might need to use spm_encode to segment an arbitrary text.
Let me close this issue as we have no update. Please feel free to reopen it if necessary.
Let me close this issue as we have no update. Please feel free to reopen it if necessary.
Thank you very much for your reply.
Hi, I used the SentencePiece with uni-gram algorithm to achieve segmentation of protein sequence. The result is two columns data. I know the first column is subword segmentation. But what does the second column (numeral column) mean? Partial results are shown in the bottom. I really and sincerely appreciate for your help.
TN -6.12931 GE -6.13264 LD -6.14611 TS -6.15723 SG -6.16908 SQ -6.17167 DD -6.17356 VA -6.17699 ID -6.17975 PL -6.18626 FK -6.19728 KQ -6.20093 LA -6.20492 SE -6.20776 NS -6.20806 TV -6.20955 NF -6.21059 KI -6.23107 VP -6.23211 KE -6.23277
Best Regard, YB!