haven-jeon / PyKoSpacing

Automatic Korean word spacing with Python
GNU General Public License v3.0
397 stars 118 forks source link

Fix : Model Input Type Error #67

Closed Jinwoo1126 closed 2 months ago

Jinwoo1126 commented 2 months ago

This is a bug fix for input type errors occurring during inference.

Run

from pykospacing import Spacing
spacing = Spacing()
spacing("김형호영화시장분석가는'1987'의네이버영화정보네티즌10점평에서언급된단어들을지난해12월27일부터올해1월10일까지통계프로그램R과KoNLP패키지로텍스트마이닝하여분석했다.")

Error Message

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
[<ipython-input-1-6b05fb953279>](https://localhost:8080/#) in <cell line: 3>()
      1 from pykospacing import Spacing
      2 spacing = Spacing()
----> 3 spacing("김형호영화시장분석가는'1987'의네이버영화정보네티즌10점평에서언급된단어들을지난해12월27일부터올해1월10일까지통계프로그램R과KoNLP패키지로텍스트마이닝하여분석했다.")

3 frames
[/usr/local/lib/python3.10/dist-packages/pykospacing/kospacing.py](https://localhost:8080/#) in __call__(self, sent, ignore, ignore_pattern)
    176             # if ignore == 'post', set post_process to True
    177             post_process = True if ignore == 'post' else False
--> 178             spaced_sent = self.get_spaced_sent(filtered_sent, deleted_str_list, deleted_idx_list, orig_sent, post_process)
    179             result_sent.append(spaced_sent)
    180         spaced_sent = ''.join(result_sent)

[/usr/local/lib/python3.10/dist-packages/pykospacing/kospacing.py](https://localhost:8080/#) in get_spaced_sent(self, raw_sent, deleted_str_list, deleted_idx_list, orig_sent, post_process)
     68             word2idx_dic=self._w2idx, sequences=sents_in, maxlen=200,
     69             padding='post', truncating='post')
---> 70         results = self._model(mat_in)
     71         mat_set = results['output_0'][0]
     72         preds = np.array(['1' if i > 0.5 else '0' for i in mat_set[:len(raw_sent_)]])

[/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py](https://localhost:8080/#) in error_handler(*args, **kwargs)
    120             # To get the full stack trace, call:
    121             # `keras.config.disable_traceback_filtering()`
--> 122             raise e.with_traceback(filtered_tb) from None
    123         finally:
    124             del filtered_tb

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py](https://localhost:8080/#) in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     51   try:
     52     ctx.ensure_initialized()
---> 53     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     54                                         inputs, attrs, num_outputs)
     55   except core._NotOkStatusException as e:

InvalidArgumentError: Exception encountered when calling TFSMLayer.call().

cannot compute __inference_signature_wrapper___call___755 as input #0(zero-based) was expected to be a float tensor but is a int32 tensor [Op:__inference_signature_wrapper___call___755]

Arguments received by TFSMLayer.call():
  • inputs=tf.Tensor(shape=(1, 200), dtype=int32)
  • training=False
  • kwargs=<class 'inspect._empty'>

Solution

def encoding_and_padding(word2idx_dic, sequences, **params):
    """
    1. making item to idx
    2. padding

    :word2idx_dic
    :sequences: list of lists where each element is a sequence
    :maxlen: int, maximum length
    :dtype: type to cast the resulting sequence.
    :padding: 'pre' or 'post', pad either before or after each sequence.
    :truncating: 'pre' or 'post', remove values from sequences larger than
        maxlen either in the beginning or in the end of the sequence
    :value: float, value to pad the sequences to the desired value.
    """
    seq_idx = [[word2idx_dic.get(a, word2idx_dic['__ETC__']) for a in i] for i in sequences]
    params['value'] = word2idx_dic['__PAD__']
    return (sequence.pad_sequences(seq_idx, **params).astype('float32')) ## <- float32 casting
haven-jeon commented 2 months ago

https://github.com/haven-jeon/PyKoSpacing/issues/66

I've verified that it works fine.

Thnaks.