ycatsh / connor

A starting take on a fast and fully local NLP file organizer that organizes files based on their content.
MIT License
47 stars 1 forks source link

Problem with connor - ValueError: After pruning, no terms remain #1

Closed denysok closed 2 weeks ago

denysok commented 1 month ago

I encountered an issue while using the connor tool to organize my documents. The command I executed was:

$ connor run --path $(xdg-user-dir DOCUMENTS)
--------------------------------------------------------------------------------
To customize default settings instead run the command <connor settings -h>
folder_name_length: 2
reading_word_limit: 100
similarity_threshold: 50%
--------------------------------------------------------------------------------
Folder '/home/denis/Документы' is being organized...
Traceback (most recent call last):
  File "/home/denis/.local/bin/connor", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/connor/main.py", line 29, in main
    cli_tool.organize_folder(args.path)
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/connor/connor_nlp/command.py", line 58, in organize_folder
    data_vectorized = vectorizer.fit_transform(words[1] for words in self.file_list)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/sklearn/feature_extraction/text.py", line 2091, in fit_transform
    X = super().fit_transform(raw_documents)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/sklearn/base.py", line 1473, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/sklearn/feature_extraction/text.py", line 1385, in fit_transform
    X = self._limit_features(
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/denis/.local/pipx/venvs/connor-nlp/lib/python3.11/site-packages/sklearn/feature_extraction/text.py", line 1237, in _limit_features
    raise ValueError(
ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df.

Expected Behavior: I believe it would be beneficial if the connor tool could:

Allow sorting of documents regardless of their content. Handle empty documents without throwing errors. This would enhance the usability of the tool, providing a more seamless experience when organizing files.

A HUGE THANKS FOR YOUR INCREDIBLE FILE ORGANIZER!!!

ycatsh commented 2 weeks ago

Commit 40cc86f solves this issue.

I have also added user confirmation to go through with the organization processes. Snippet from CLI:

...
The above directory tree explains how the folder will be organized.
Do you want to continue? [y/n] y
Folder 'path/to/folder' organized successfully.
...
The above directory tree explains how the folder will be organized.
Do you want to continue? [y/n] n
Folder organization aborted. The files in 'path/to/folder' were left untouched.