goru001 / inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
https://inltk.readthedocs.io
MIT License
824 stars 163 forks source link

[Error in Sentence & word encoding for Hindi] #58

Closed sara-02 closed 4 years ago

sara-02 commented 4 years ago

Python: 3.8 torch==1.6.0+cpu torchvision==0.7.0+cpu inltk=0.9 OS: Ubuntu 20

Steps followed: As given in the documentation, https://inltk.readthedocs.io/en/latest/api_docs.html

from inltk.inltk import setup
setup('hi')
Downloading Model. This might take time, depending on your internet connection. Please be patient.
We'll only do this for the first time.
Downloading Model. This might take time, depending on your internet connection. Please be patient.
We'll only do this for the first time.
Done!
>>> from inltk.inltk import get_embedding_vectors
>>> vectors = get_embedding_vectors('भारत', 'hi')
Traceback (most recent call last):                                                                             
  File "<stdin>", line 1, in <module>
  File "env3/lib/python3.8/site-packages/inltk/inltk.py", line 100, in get_embedding_vectors
    learn = load_learner(path / 'models' / f'{language_code}')
  File "env3/lib/python3.8/site-packages/fastai/basic_train.py", line 626, in load_learner
    res = clas_func(data, model, **state)
  File "env3/lib/python3.8/site-packages/fastai/text/learner.py", line 52, in __init__
    super().__init__(data, model, metrics=metrics, **learn_kwargs)
  File "<string>", line 20, in __init__
  File "env3/lib/python3.8/site-packages/fastai/basic_train.py", line 166, in __post_init__
    self.model = self.model.to(self.data.device)
  File "env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "env3/home/sarah/github/Offensive_Hindi/env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "env3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 159, in _apply
    self._flat_weights = [(lambda wn: getattr(self, wn) if hasattr(self, wn) else None)(wn) for wn in self._flat_weights_names]
  File "env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 771, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights_names'

Same error as above occurs when using

>>> from inltk.inltk import get_sentence_encoding
>>> encoding = get_sentence_encoding('मुझे अपने देश से', 'hi')
......
env3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 771, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights_names'
sara-02 commented 4 years ago

A quick update, I tried the same sample code on Kaggle, with the following setting and got the same error.


Python 3.7

pytorch-ignite==0.4.2
pytorch-lightning==1.0.2
torch==1.6.0
torchaudio==0.6.0a0+f17ae39
torchtext==0.8.0a0+c851c3e
torchvision==0.7.0
inltk==0.9

Error

from inltk.inltk import get_sentence_encoding

encoding = get_sentence_encoding('मुझे अपने देश से', 'hi')

---------------------------------------------------------------------------
ModuleAttributeError                      Traceback (most recent call last)
<ipython-input-9-156adabc231f> in <module>
      1 from inltk.inltk import get_sentence_encoding
----> 2 encoding = get_sentence_encoding('मुझे अपने देश से', 'hi')

/opt/conda/lib/python3.7/site-packages/inltk/inltk.py in get_sentence_encoding(input, language_code)
    116     defaults.device = torch.device('cpu')
    117     path = Path(__file__).parent
--> 118     learn = load_learner(path / 'models' / f'{language_code}')
    119     encoder = learn.model[0]
    120     encoder.reset()

/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py in load_learner(path, file, test, **db_kwargs)
    624     cb_state = state.pop('cb_state')
    625     clas_func = state.pop('cls')
--> 626     res = clas_func(data, model, **state)
    627     res.callback_fns = state['callback_fns'] #to avoid duplicates
    628     res.callbacks = [load_callback(c,s, res) for c,s in cb_state.items()]

/opt/conda/lib/python3.7/site-packages/fastai/text/learner.py in __init__(self, data, model, split_func, clip, alpha, beta, metrics, **learn_kwargs)
     50                                                      isinstance(data.train_ds.y, LMLabelList)))
     51         metrics = ifnone(metrics, ([accuracy] if is_class else []))
---> 52         super().__init__(data, model, metrics=metrics, **learn_kwargs)
     53         self.callbacks.append(RNNTrainer(self, alpha=alpha, beta=beta))
     54         if clip: self.callback_fns.append(partial(GradientClipping, clip=clip))

/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py in __init__(self, data, model, opt_func, loss_func, metrics, true_wd, bn_wd, wd, train_bn, path, model_dir, callback_fns, callbacks, layer_groups, add_time, silent, cb_fns_registered)

/opt/conda/lib/python3.7/site-packages/fastai/basic_train.py in __post_init__(self)
    164         "Setup path,metrics, callbacks and ensure model directory exists."
    165         self.path = Path(ifnone(self.path, self.data.path))
--> 166         self.model = self.model.to(self.data.device)
    167         self.loss_func = self.loss_func or self.data.loss_func
    168         self.metrics=listify(self.metrics)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    605             return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
    606 
--> 607         return self._apply(convert)
    608 
    609     def register_backward_hook(

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    352     def _apply(self, fn):
    353         for module in self.children():
--> 354             module._apply(fn)
    355 
    356         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    352     def _apply(self, fn):
    353         for module in self.children():
--> 354             module._apply(fn)
    355 
    356         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    352     def _apply(self, fn):
    353         for module in self.children():
--> 354             module._apply(fn)
    355 
    356         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _apply(self, fn)
    352     def _apply(self, fn):
    353         for module in self.children():
--> 354             module._apply(fn)
    355 
    356         def compute_should_use_set_data(tensor, tensor_applied):

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/rnn.py in _apply(self, fn)
    157         # Note: be v. careful before removing this, as 3rd party device types
    158         # likely rely on this behavior to properly .to() modules like LSTM.
--> 159         self._flat_weights = [(lambda wn: getattr(self, wn) if hasattr(self, wn) else None)(wn) for wn in self._flat_weights_names]
    160         # Flattens params (on CUDA)
    161         self.flatten_parameters()

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    770                 return modules[name]
    771         raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
--> 772             type(self).__name__, name))
    773 
    774     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

ModuleAttributeError: 'LSTM' object has no attribute '_flat_weights_names'
goru001 commented 4 years ago

@sara-02 Your torch version seems to be wrong. Check installation instructions here, you need to install torch v1.3

sara-02 commented 4 years ago

Hey @goru001 I was not able to install the v1.3

ERROR: Could not find a version that satisfies the requirement torch==1.3.1+cpu (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 1.4.0, 1.4.0+cpu, 1.4.0+cu100, 1.4.0+cu92, 1.5.0, 1.5.0+cpu, 1.5.0+cu101, 1.5.0+cu92, 1.5.1, 1.5.1+cpu, 1.5.1+cu101, 1.5.1+cu92, 1.6.0, 1.6.0+cpu, 1.6.0+cu101, 1.6.0+cu92)
ERROR: No matching distribution found for torch==1.3.1+cpu

That is why I ended up installing the one suggested here https://pytorch.org/get-started/locally/#pip-1

sara-02 commented 4 years ago

Which was pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

zmf0507 commented 4 years ago

pip install torch==1.3.0 torchvision==0.4.1 worked for me

sara-02 commented 4 years ago

It seems like an issue with Pytorch and Python version compatibiltiy. When tested on a system with Python3.6 the above installation of torch worked and I got results for embedding, but the same installation does not work for Python 3.8. Cannot comment about py3.7.

sara-02 commented 4 years ago

Quick update, so I installed Python 3.6 on my system and then performed the above installations. They worked. The setup Hindi part worked as well, but when I tried obtaining the word or sentence embedding, I got a segmentation fault error

 $ python3
Python 3.6.12 (default, Aug 17 2020, 23:45:20) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from inltk.inltk import get_sentence_encoding
>>> encoding = get_sentence_encoding('मुझे अपने देश से', 'hi')
Segmentation fault (core dumped)
goru001 commented 4 years ago

@sara-02 It seems something's wrong with your environment, but If you can reproduce this error, on lets say colab, then I'll be able to help you out?

Thanks for letting me know that there's an issue with Python 3.8. I'll try and test it out and possibly add it in README. Feel free to add those instructions and raise a PR if you want to!

sara-02 commented 4 years ago

@goru001 https://colab.research.google.com/drive/1xBPsf3jHv00Aa-ZO5nsqWN3NJFNf5U-h?usp=sharing I tried setting it up with python 3.6 on colab and the session is crashing just like my notebooks were. I am assuming an issue with seg_fault. Please have a look and let me know if I am doing something wrong here.

goru001 commented 4 years ago

@sara-02 Thanks for sharing the notebook. I checked it. I was not able to reproduce this on python version 3.6.3 and 3.6.8, but was able to reproduce on 3.6.9. I also tried this on Kaggle kernels which has python 3.7.6 and there also it seems to be working fine. So I think it has something to do with python version I guess, and will take some time for me to figure out. Can you switch to v3.6.8 or v3.6.3 ? It should work fine post that!

sara-02 commented 4 years ago

@goru001 I see. I think we can close this issue as it is no longer specific to the embedding use case, and create another issue to track the Python version, meanwhile, I will send a PR requiring the use of specific versions.