huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.29k stars 26.35k forks source link

Getting: AttributeError: 'BertTokenizer' object has no attribute 'encode' #2889

Closed VeereshShringari closed 4 years ago

VeereshShringari commented 4 years ago

🐛 Bug

AttributeError: 'BertTokenizer' object has no attribute 'encode'

Model, I am using Bert

The language I am using the model on English

The problem arises when using:

    input_ids = torch.tensor([tokenizer.encode("raw_text", add_special_tokens=True)]) 

The tasks I am working on is:

##Text Summary for the following paragraph of text 
 <code>
"['26The Indian organic market\nhave begun to disrupt the market with their one-of-a-kind \nofferings.', 'In an effort to promote a healthier lifestyle, these \n\nplayers are playing a pivotal role by providing consumers with \n\nwholesome organic produce.', 'Since the organic food segment is still at a nascent stage \nin India, both the Government and private players need \n\n\n\ninvolved.', 'The organic farming industry in India holds immense \n\npotential to grow, provided it receives steady investment \n\n\n\nlike incentivizing organic cultivation, food processing, \n\n\n\nof the challenges faced by the organic sector today can be \n\ngrouped into three heads:\n\nŁ \n\nlengthy procedures, international validity, inadequate \ncertifying agencies and inadequate supporting infrastructure \n\n\n\n\ncost of internal audits and documentation is approximately \n\n\n\nreduced, it is expensive for many small groups of farmers or \nindividual farmers.', 'Ł \nThere is also a gap in the \n\nrequirements.', 'Additionally, key trading partners have \ntraditionally demonstrated a lack of willingness to sign \n\nequivalence arrangements.', 'Ł \nThe \n\n\nprocess of the farm or crop cannot be placed in the organic \n\n\nharvest is sold as conventional crops, thereby causing the \nfarmer to incur a loss.', 'Ł \ncommodities: \nDairy products have a different standard while \nmeat has a different standard .', 'The process of standardization \n\nof organic coconut will be different from that of the value-\n\nadded products of coconut.', 'Therefore, a company having \n\nand maintain multiple records as per the applicable standards.', 'Ł \n\nnumber of producers in the world yet they cultivate less than \n1% of the organic area.', 'The conventional production system is \nmore lucrative given the land fragmentation.', 'Ł Lack of incentives for farmers: \nThe transition from \n\nconventional to organic farming is accompanied by high \ninput costs and low yields in the initial years.', 'The cost of \ngoing completely organic is quite high, due to the high cost \n\nof organic manure.', 'The commercially available bio-manure \nproducts may not be completely organic, and therefore the \n\n\nThis is one of the many reasons why farmers are skeptical \nwhen it comes to shifting from conventional to organic \nfarming.', 'In such cases, the farmers choose to play it safe by \n\npracticing conventional methods of farming.', 'Ł Lack of standardized organic agriculture inputs and subsidy \non organic inputs:\n Farmers also face an acute shortage of \nquality standardized organic agriculture inputs, which are \noften much more expensive than conventional agricultural \n\ninputs.', 'There are no subsidies from the Government on \nagriculture inputs, especially biofertilizers and biopesticides, \nmaking the cost of cultivation for organic farming quite high.', 'Unless the farmers use their own farm grown manure in \nlarge quantities, they are unable to meet the expenses.', 'Lack \nof proper organic inputs often results in low yield making \n\norganic farming unsustainable for the farmers.', 'Ł Lack of organic cultivation research and extension: \nThe \n\ncurrent research and extension on organic farming are much \nlesser than that on conventional farming.', 'There is a lack of \n\n\nStrong government support for producing non-GMO high \nyielding varieties and niche crops for organic farming \nunder different agro-ecological zones across India require \n\ninvestment in organic research and extension.', 'The extension \nservices are very limited for organic, for example, the ATMA \nscheme focuses more on conventional farming.', 'There is no \n\ntimely advisory available for organic pest and disease control \n\nmeasures.', 'Processor-level challenges\nŁ Supply chain issues: \nMany farmers are apprehensive of \n\norganic farming since it involves high production costs.', 'The emphasis on collection, transportation and storage of \nfresh organic produce is very high.', 'Due to relatively low \n\nvolumes, the marketing and distribution chain of organic food \n\nvery high.', 'For example, organic produce cannot be stored in \n\ngovernment warehouses that practice chemical treatment of \nstorage areas.', 'High demand and low supply further create \n\n\nthese products have higher price markups than conventional \nproducts.', 'Additionally, many sellers mix the produce from \ndifferent geographical regions to help attain a competitive \n\nprice, thus compromising the geographical origin norm.', 'Ł Lack of a proper organic supply chain is felt more acutely in \n\nhilly, tribal and remote areas that have a high potential for \n\ninfrastructure.', 'Ł Global competitiveness:\n A major challenge India faces is \n\nthat of increasing its share in the global organic food export \nmarket, in lieu of global competitiveness.', 'There often exists a \ndichotomy between international quality and safety standards \n\nand Indian organic stands, which puts Indian produce at a \ndisadvantage.', 'Ł Lack of proper branding and packaging: \n\nof organic products require separate packing material that is \nnatural and requires distinctive branding that distinguishes \norganic from conventional products.', 'At present, there is \n\nan absence of regulations on labeling standards.', 'There is \n34\n\n10, 201835']"

To reproduce

Steps to reproduce the behavior:

  1. In the first Imported torch
    import torch
    from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
    import logging
  2. defined models :
    MODELS = [(BertModel,       BertTokenizer,       'bert-base-uncased') ] 
  3. # Let's encode some text in a sequence of hidden-states using each model:
    for model_class, tokenizer_class, pretrained_weights in MODELS:
    # Load pretrained model/tokenizer
    tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
    model = model_class.from_pretrained(pretrained_weights)
  4. If I am trying to encode with following code
    # Encode text
    <code> input_ids = torch.tensor([tokenizer.encode("raw_text", add_special_tokens=True)])  # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
    with torch.no_grad():
        last_hidden_states = model(input_ids)[0]

    I am getting following error

    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-10-190085fa3098> in <module>
      1 # Encode text
    ----> 2 input_ids = torch.tensor([tokenizer.encode("raw_text", add_special_tokens=True)])  # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
      3 with torch.no_grad():
      4     last_hidden_states = model(input_ids)[0]  # Models outputs are now tuples

AttributeError: 'BertTokenizer' object has no attribute 'encode'


## Expected behavior
Tokenization should get completed
## Environment info

- `transformers` version: '0.6.2'
- Platform: Windows 10
- Python version: 3.5
- PyTorch version (GPU?): 1.1.0 no gpu
- Tensorflow version (GPU?): Tensorflow 2.0
- Using GPU in script?:No
- Using distributed or parallel set-up in script?:No
cronoik commented 4 years ago

Please fix the formatting of your post and use code tags.

VeereshShringari commented 4 years ago

I made the changes still all the text is shown struck off form. I am new to this bug log not sure how to change to code tag

VeereshShringari commented 4 years ago

I have used tag

BramVanroy commented 4 years ago
VeereshShringari commented 4 years ago

I did the tags as suggested by BramVanroy by using guidelines::https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks

BramVanroy commented 4 years ago

You clearly did something wrong because, as you can see yourself, all text is striked through. Likely caused by having tildes (~) around your post.

VeereshShringari commented 4 years ago

Thanks, I cleared it, there was one hiding beside a comment.

BramVanroy commented 4 years ago

You are using an old version of the library (pytorch_pretrained_bert). You should move to transformers instead.

VeereshShringari commented 4 years ago

I upgraded latest transformers still I am getting following error message :

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-645c7873d473>", line 1, in <module>
    encoding = tokenizer.encode(raw_text)
AttributeError: 'BertTokenizer' object has no attribute 'encode'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2034, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'AttributeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Users\Veeresh\Anaconda3\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Users\Veeresh\Anaconda3\lib\imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\ultratb.py", line 1151, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\ultratb.py", line 319, in wrapped
    return f(*args, **kwargs)
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\ultratb.py", line 353, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
  File "C:\Users\Veeresh\Anaconda3\lib\inspect.py", line 1502, in getinnerframes
    frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
  File "C:\Users\Veeresh\Anaconda3\lib\inspect.py", line 1460, in getframeinfo
    filename = getsourcefile(frame) or getfile(frame)
  File "C:\Users\Veeresh\Anaconda3\lib\inspect.py", line 696, in getsourcefile
    if getattr(getmodule(object, filename), '__loader__', None) is not None:
  File "C:\Users\Veeresh\Anaconda3\lib\inspect.py", line 733, in getmodule
    if ismodule(module) and hasattr(module, '__file__'):
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow\__init__.py", line 50, in __getattr__
    module = self._load()
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow\__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "C:\Users\Veeresh\Anaconda3\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\__init__.py", line 42, in <module>
    from . _api.v2 import audio
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\_api\v2\audio\__init__.py", line 10, in <module>
    from tensorflow.python.ops.gen_audio_ops import decode_wav
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\ops\gen_audio_ops.py", line 9, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow\__init__.py", line 50, in __getattr__
    module = self._load()
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow\__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "C:\Users\Veeresh\Anaconda3\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-645c7873d473>", line 1, in <module>
    encoding = tokenizer.encode(raw_text)
AttributeError: 'BertTokenizer' object has no attribute 'encode'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Veeresh\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2034, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'AttributeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\Veeresh\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Users\Veeresh\Anaconda3\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Users\Veeresh\Anaconda3\lib\imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: The specified module could not be found.

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
---------------------------------------------------------------------------
BramVanroy commented 4 years ago

There's a lot going wrong in that trace. Please recreate your environment from scratch to ensure that all correct dependencies are installed. Particularly, in your first post you were using torch, but your new trace throws Tensorflow errors.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.