Closed SunMarc closed 2 days ago
This PR fixes the failing gguf tests caused by this PR. Instead of using the converter methods from SpmConverter class for GGUFLlamaConverter , we copy it + remove the parts that are using pieces and trainer_spec.control_symbols attributes from self.proto that comes from the sentencepiece (https://github.com/google/sentencepiece/blob/6225e08edb2577757163b3f5dbba4c0b670ef445/src/sentencepiece_model.proto#L299C29-L299C33). GGUFTokenizerSkeleton don't have these attributes and I don't think we should add them.
SpmConverter
GGUFLlamaConverter
pieces
trainer_spec.control_symbols
self.proto
GGUFTokenizerSkeleton
Fixes https://github.com/huggingface/transformers/issues/31553
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
This PR fixes the failing gguf tests caused by this PR. Instead of using the converter methods from
SpmConverter
class forGGUFLlamaConverter
, we copy it + remove the parts that are usingpieces
andtrainer_spec.control_symbols
attributes fromself.proto
that comes from the sentencepiece (https://github.com/google/sentencepiece/blob/6225e08edb2577757163b3f5dbba4c0b670ef445/src/sentencepiece_model.proto#L299C29-L299C33).GGUFTokenizerSkeleton
don't have these attributes and I don't think we should add them.Fixes https://github.com/huggingface/transformers/issues/31553