taishi-i / nagisa

A Japanese tokenizer based on recurrent neural networks
https://huggingface.co/spaces/taishi-i/nagisa-demo
MIT License
379 stars 22 forks source link

importing nagisa gives error "source code string cannot contain null bytes" #34

Closed worthy7 closed 2 months ago

worthy7 commented 5 months ago

Full output

nagisa-0.2.11-cp310-cp310-manylinux_2_5_x86_64

ValueError                                Traceback (most recent call last)
Cell In[10], [line 6](vscode-notebook-cell:?execution_count=10&line=6)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # now, tokenizing the data
      [2](vscode-notebook-cell:?execution_count=10&line=2) #Text preprocessing, tokenizing and filtering of stopwords are all included in CountVectorizer, which builds a dictionary of features and transforms documents to feature vectors:
      [3](vscode-notebook-cell:?execution_count=10&line=3) 
      [4](vscode-notebook-cell:?execution_count=10&line=4) # custom tokenization, this also removes common words
      [5](vscode-notebook-cell:?execution_count=10&line=5) from keyword_extraction import extract_keyword
----> [6](vscode-notebook-cell:?execution_count=10&line=6) import nagisa
      [7](vscode-notebook-cell:?execution_count=10&line=7) # Takes in a document, returns the list of words
      [8](vscode-notebook-cell:?execution_count=10&line=8) def tokenize_jp(doc):

File [~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4)
      [1](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:1) import nagisa_utils as utils
      [3](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:3) from nagisa.tagger import Tagger
----> [4](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:4) from nagisa.train import fit
      [6](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:6) version = '0.2.11'
      [7](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/__init__.py:7) # Initialize instance

File [~/.local/lib/python3.10/site-packages/nagisa/train.py:11](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:11)
      [7](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:7) import logging
      [8](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:8) from collections import OrderedDict
---> [11](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:11) import model
     [12](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:12) import prepro
     [13](https://BigQuery AI76qp.vscode-resource.vscode-cdn.net/workspaces/voc-application/category-prediction/~/.local/lib/python3.10/site-packages/nagisa/train.py:13) import mecab_system_eval
taishi-i commented 5 months ago

Hi @worthy7. Thank you for using nagisa and sending us a bug report. I apologize for any inconvenience caused. I would like to investigate the cause of the error, so please let me know the version of the OS of your environment.

worthy7 commented 5 months ago

Actually this was just inside GitHub codespaces. I think it's Ubuntu but should be easy to reproduce.

On Thu, 28 Mar 2024, 18:03 taishi-i, @.***> wrote:

Hi @worthy7 https://github.com/worthy7. Thank you for using nagisa and sending us a bug report. I apologize for any inconvenience caused. I would like to investigate the cause of the error, so please let me know the version of the OS of your environment.

— Reply to this email directly, view it on GitHub https://github.com/taishi-i/nagisa/issues/34#issuecomment-2024712480, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEKWHZF6BUFPWLC57RJHKDY2PMGHAVCNFSM6AAAAABFMFQHN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRUG4YTENBYGA . You are receiving this because you were mentioned.Message ID: @.***>

taishi-i commented 5 months ago

Thank you for your response. I will now try to reproduce tokenizing texts with nagisa in GitHub codespaces, so please wait a moment. I will respond as soon as I identify the cause.

taishi-i commented 5 months ago

I have completed the reproduction, and it seems that nagisa can be used without any issues if it's the simplest configuration of GitHub codespace.

Here's the configuration:

  1. Create a new project in a new codespace and select 2-core 8GB RAM 32GB.
  2. Next, install the Python extension (Python3.10.13 v2024.2.1).
  3. Perform the installation with pip install nagisa.
  4. Check the operation in the terminal.

I would like to identify the cause of the error. First, could you try executing import nagisa in your terminal in GitHub codespace to see if it can be imported without any problems?

taishi-i commented 2 months ago

It is unlikely that this is an issue within the nagisa code, so I will close this issue. If the problem persists, please reopen this issue and add a comment. Thank you.