protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.12k stars 15.44k forks source link

TypeError: Couldn't build proto file into descriptor pool: duplicate file name sentencepiece_model.proto #12913

Closed vmajor closed 1 year ago

vmajor commented 1 year ago

What version of protobuf and what language are you using? Version: protobuf-4.23.1, protobuf-4.23.1 no binary, protobuf-3.2.0, protobuf-4.21.9, protobuf-4.21.11

What operating system (Linux, Windows, ...) and version?

5.15.68.1-microsoft-standard-WSL2+ #2 SMP

What runtime / compiler are you using (e.g., python version or gcc version)

Python 3.10.9

What did you do?

import guidance
from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

path = '/home/xxxx/models/gpt4-alpaca-lora_mlp-65B'
llm = AutoModelForCausalLM.from_pretrained('/home/xxxx/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin', model_type='llama')
tokenizer = AutoTokenizer.from_pretrained('decapoda-research/llama-65b-hf')
guidance.llm = guidance.llms.transformers.LLaMA(tokenizer, device_map='auto')

What did you expect to see

Not the traceback.

What did you see instead?

Traceback (most recent call last):
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_3293068/49806104.py", line 9, in <module>
    tokenizer = AutoTokenizer.from_pretrained('decapoda-research/llama-65b-hf')
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 693, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
    super().__init__(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 1303, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 445, in __init__
    from .utils import sentencepiece_model_pb2 as model_pb2
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 28, in <module>
    DESCRIPTOR = _descriptor.FileDescriptor(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 1066, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool: duplicate file name sentencepiece_model.proto

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2105, in showtraceback
    stb = self.InteractiveTB.structured_traceback(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1396, in structured_traceback
    return FormattedTB.structured_traceback(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1287, in structured_traceback
    return VerboseTB.structured_traceback(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1140, in structured_traceback
    formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 1055, in format_exception_as_a_whole
    frames.append(self.format_record(record))
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 955, in format_record
    frame_info.lines, Colors, self.has_colors, lvals
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/IPython/core/ultratb.py", line 778, in lines
    return self._sd.lines
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/core.py", line 734, in lines
    pieces = self.included_pieces
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/core.py", line 681, in included_pieces
    pos = scope_pieces.index(self.executing_piece)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/utils.py", line 144, in cached_property_wrapper
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/stack_data/core.py", line 660, in executing_piece
    return only(
  File "/home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/executing/executing.py", line 190, in only
    raise NotOneValueFound('Expected one value, found 0')
executing.executing.NotOneValueFound: Expected one value, found 0
haberman commented 1 year ago

It looks like your project is importing two different sentencepiece_model_pb2.py files. One lives at /home/******/anaconda3/envs/guidance/lib/python3.10/site-packages/transformers/utils/sentencepiece_model_pb2.py. There is probably another one somewhere in your tree that is getting imported from a different directory.

ben-foxmoore commented 1 year ago

@haberman are there any recommendations on how to handle two sets of Protobufs which both contain files with the same name? E.g. consuming two APIs defined by Protobufs which both have a types.proto or something similarly generic

haberman commented 1 year ago

Every .proto file should have a unique path name. eg. Protobuf itself ships a proto called google/protobuf/timestamp.proto. If another project also wants a timestamp proto, it should have a different directory name, like fooproject/timestamp.proto.

These then map to Python package names, eg

import google.protobuf.timestamp_pb2
import fooproject.timestamp_pb2

Imports in .proto files should always use the full file name:

import "google/protobuf/timestamp.proto";
import "fooproject/timestamp.proto";

And invocations of proto should always be relative to the base directory, eg.

$ protoc --python_out=. google/protobuf/timestamp.proto
nafiz09 commented 2 weeks ago

anyone found any solution of it?