unicode-org / lstm_word_segmentation

Python code for training an LSTM model for word segmentation in Thai, Burmese, and similar languages.
Other
20 stars 9 forks source link

`FileNotFoundError` when calling the `test_model_line_by_line()` function #30

Closed 0saurabh0 closed 8 months ago

0saurabh0 commented 8 months ago

encountering a FileNotFoundError when running the test_model_line_by_line function in train_thai/burmese.py. The error message indicates that the file news_00040.txt in the directory /home/srvk/Desktop/lstm_w/lstm_word_segmentation/Data/Best/news/ does not exist.

Here is the full error message:

Traceback (most recent call last):
  File "/home/srvk/Desktop/lstm_w/lstm_word_segmentation/train_thai.py", line 35, in <module>
    word_segmenter.test_model_line_by_line(verbose=True, fast=True)
  File "/home/srvk/Desktop/lstm_w/lstm_word_segmentation/lstm_word_segmentation/word_segmenter.py", line 393, in test_model_line_by_line
    text_acc = self._test_text_line_by_line(file=file, line_limit=-1, verbose=verbose)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srvk/Desktop/lstm_w/lstm_word_segmentation/lstm_word_segmentation/word_segmenter.py", line 346, in _test_text_line_by_line
    lines = get_lines_of_text(file, "man_segmented")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/srvk/Desktop/lstm_w/lstm_word_segmentation/lstm_word_segmentation/text_helpers.py", line 204, in get_lines_of_text
    with open(file) as f:
         ^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/srvk/Desktop/lstm_w/lstm_word_segmentation/Data/Best/news/news_00040.txt'

CC @SahandFarhoodi @sffc

12sachingupta commented 8 months ago

File Access: If the file exists and the path is correct, ensure that the script has sufficient access rights to read the file. Sometimes, file permissions or SELinux / AppArmor restrictions can cause such issues.

Data Loading: If the file is indeed missing or inaccessible, you may need to provide the correct data file or adjust the script to handle missing files gracefully, depending on the requirements of your application.

Logging: Implement logging or print statements in the script to track the flow of execution and identify any other issues that might be leading to this error.

sffc commented 8 months ago

It looks like you need to make sure that the source data is downloaded and available on your system.

Cleaning this up is something that could be part of the GSoC project. If you are applying to GSoC, remember to submit your proposals by April 2 at 18:00 UTC. You can also email gsoc@unicode.org with questions.