issues
search
google
/
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.31k
stars
1.18k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The pip command to install the SentencePiece Python module fails.
#1070
tprrt
opened
1 week ago
0
Doesn't seem to work with Python 3.13
#1069
tprrt
opened
1 week ago
0
Initialized number of seed sentencepieces too low
#1068
DmitriiP20
opened
1 week ago
0
subprocess-exited-with-error
#1067
Nana-kwame-junior
closed
2 weeks ago
1
Update artifact actions from v3 to v4
#1066
kasinadhsarma
opened
3 weeks ago
1
Asan detects memory leak in sentencepiece/_sentencepiece.cpython-312-x86_64-linux-gnu.so+0x6f7f4
#1065
renxida
opened
3 weeks ago
0
Bump the build-time-deps group across 1 directory with 4 updates
#1064
dependabot[bot]
opened
3 weeks ago
0
Bump the github-actions group across 1 directory with 3 updates
#1063
dependabot[bot]
opened
3 weeks ago
0
Distributed implementation of the unsupervised unigram word segmentizer possible?
#1062
y-he2
closed
3 weeks ago
0
Enhancements to CI Workflows and Python Module Initialization with Minor Fixes
#1061
kasinadhsarma
opened
1 month ago
1
Compatibility Issue when using v0.2.0 with transformers and tensorflow
#1060
aws-tianquaw
opened
1 month ago
1
"space must not be included in normalized string" when training with a sentence iterator
#1059
bauwenst
closed
1 month ago
1
Bump the github-actions group across 1 directory with 3 updates
#1058
dependabot[bot]
closed
3 weeks ago
1
Bump the build-time-deps group across 1 directory with 3 updates
#1057
dependabot[bot]
closed
3 weeks ago
1
Use libsentencepiece.0.dylib in macos. When load model, model_factory.cc(43) LOG(ERROR) Unknown model_type: 16
#1056
codeAndxv
closed
1 month ago
2
Can't load Llama3 tokenizer.model
#1054
fabriceyhc
closed
1 month ago
1
Is it possible to add normalization rules into a trained sentence piece model?
#1053
lost-libra
closed
2 months ago
2
Training with a custom base vocabulary and handling reserved tokens
#1052
rteehas
closed
2 months ago
1
Crashes on out of range inputs depending on other inputs
#1051
colehaus
opened
2 months ago
1
logprobs in the vocabulary file do not match the values computed from the tokenized training document
#1050
pnugues
closed
2 months ago
2
Bump the build-time-deps group in /.github/workflows/requirements with 2 updates
#1049
dependabot[bot]
closed
1 month ago
1
Bump the github-actions group with 2 updates
#1048
dependabot[bot]
closed
1 month ago
1
With unigram algorithm, constant piece at end of each sentences does not become a token
#1047
jogardi
opened
3 months ago
0
Error Attribute Error: type object 'SentencePieceTrainer' has no attribute 'train'. Did you mean: 'Train'?
#1046
bop578530
opened
3 months ago
1
builds for android devices
#1045
RaoufiTech
closed
3 months ago
0
decode token one by one
#1044
nigelzzz
closed
3 months ago
1
decode one by one can't show space
#1043
nigelzzz
closed
3 months ago
3
Why is the Hugging Face encoding 1 greater compared to the Google SentencePiece encoding when using the XLM-RoBERTa SentencePiece tokenizer?
#1042
RaoufiTech
closed
3 months ago
2
Bump the build-time-deps group across 1 directory with 3 updates
#1041
dependabot[bot]
closed
3 months ago
0
Bump the github-actions group with 3 updates
#1040
dependabot[bot]
closed
3 months ago
0
multi-thread batch encode seems slower than list comprehension
#1039
Mr-Grin
closed
3 months ago
1
Update setup.py
#1038
raushanksec
closed
4 months ago
1
Add support for windows arm64
#1037
Nagico2
closed
3 months ago
1
Bump the build-time-deps group across 1 directory with 4 updates
#1036
dependabot[bot]
closed
3 months ago
1
Bump the pip group in /.github/workflows/requirements with 2 updates
#1035
dependabot[bot]
closed
4 months ago
0
Bump certifi from 2023.11.17 to 2024.7.4 in /.github/workflows/requirements in the pip group
#1034
dependabot[bot]
closed
4 months ago
0
Bump the github-actions group across 1 directory with 6 updates
#1033
dependabot[bot]
closed
4 months ago
0
Bump the build-time-deps group in /.github/workflows/requirements with 4 updates
#1032
dependabot[bot]
closed
4 months ago
0
Zero Width Joiner issue for Sinhala Language
#1031
Nadil-K
opened
5 months ago
0
No typings in Python package
#1030
marcospgp
opened
5 months ago
1
When I set SPM_PROTOBUF_PROVIDER to "package" in CMakeLists.txt, the compilation fails.
#1029
hhxdestiny
opened
5 months ago
0
trainer_interface.cc: Integer value -1 is outside the valid range of values [0, 255] for the enumeration type 'ScriptType'
#1028
kcoul
opened
5 months ago
1
Bump urllib3 from 2.1.0 to 2.2.2 in /.github/workflows/requirements in the pip group
#1027
dependabot[bot]
closed
5 months ago
0
Error
#1026
silentghost1412
closed
5 months ago
1
install command line tools without sudo
#1025
zjesko
closed
5 months ago
1
Wrong calculation of max_score in unigram_model.cc
#1024
fairydreaming
opened
5 months ago
0
How to deal with id
#1023
980202006
opened
5 months ago
3
Parameterize lattice node allocator size to optimize chunk allocation performance
#1022
PriyankaRanganath
closed
5 months ago
3
How long does it take to train 31.2GB text data?
#1021
Mintchocolater
closed
5 months ago
1
Bump the build-time-deps group in /.github/workflows/requirements with 3 updates
#1020
dependabot[bot]
closed
5 months ago
0
Next