maria-antoniak / little-mallet-wrapper

A Python wrapper around the topic modeling functions of MALLET.
GNU General Public License v3.0
100 stars 17 forks source link

quick_train_topic_model issues #7

Closed ccmilne closed 1 year ago

ccmilne commented 3 years ago

Hello! I'm having some issues with the quick_train_topic_model function. Following the demo, I specified my output_directory_path as 'lmw-output' within the same directory as my Jupyter notebook. However, I'm getting an error that says FileNotFoundError: [Errno 2] No such file or directory: 'lmw-output/mallet.topic_keys.20'

The demo doesn't specify creating any directories or files before running the code. How can I address this issue?

Thanks,

maria-antoniak commented 3 years ago

Hi there! quick_train_topic_model runs a series of other functions that you would otherwise need to run manually. If those functions fail, then the whole thing fails, as each step relies on files created in the previous step. (Someday I'll add better error messages here; it's just a little tricky since some of these errors are coming from MALLET, and we would want to retrieve those errors both in the command line and in a notebook, and on both Windows and Mac.)

Some common errors for this function:

I would first check the output directory path and then check that you are able to run MALLET on the command line following MALLET's documentation: http://mallet.cs.umass.edu/quick-start.php

siddharth-kale commented 2 years ago

Hi Maria,

Thanks for the really cool wrapper. I'm getting the same error as @ccmilne . While the training.txt file is being created in the output folder without any issues, the topic keys file is not. I confirmed the checklist you mentioned above on my end and rest all looks fine. Still the error persists.

Thanks, Siddharth

Hello! I'm having some issues with the quick_train_topic_model function. Following the demo, I specified my output_directory_path as 'lmw-output' within the same directory as my Jupyter notebook. However, I'm getting an error that says FileNotFoundError: [Errno 2] No such file or directory: 'lmw-output/mallet.topic_keys.20'

The demo doesn't specify creating any directories or files before running the code. How can I address this issue?

Thanks,

maria-antoniak commented 2 years ago

Hi Siddharth! That probably means that training is failing for some reason on your end. I would try using the individual functions import_data(), train_topic_model(), and load_topic_keys() to make sure this is the issue. If train_topic_model() is indeed the culprit, then you can try running the MALLET training yourself on the command line following these instructions.

dannylesmy commented 2 years ago

You must define the environment variable that contains the source and not just the path to the binaries set You can do it easily with os.environ import os os.environ["mallet_home"]=your path i.e. C:\Mallet mallet_path = "C:\mallet\bin\mallet"

smryd commented 1 year ago

I am having the same issue cited by @ccmilne and @siddharth-kale above. I attempted to run the individual functions as advised by @maria-antoniak. I traced the issue, or at least one issue, back to import_data(). The function runs but yields a "NoneType" output. I am also seeing that training.txt is created but the formatted training file is not.

maria-antoniak commented 1 year ago

Hi @smryd! Probably something is going wrong with your connection to MALLET. Make sure you've specified your MALLET path, and then if it still isn't working, trying running MALLET from the command line to make sure everything is set up correctly: https://mimno.github.io/Mallet/import

smryd commented 1 year ago

Hi @smryd! Probably something is going wrong with your connection to MALLET. Make sure you've specified your MALLET path, and then if it still isn't working, trying running MALLET from the command line to make sure everything is set up correctly: https://mimno.github.io/Mallet/import

I was able to find the problem by running mallet on command line. The environment variable was set to \bin and not the overarching mallet folder. Everything is working now, thanks!