maria-antoniak / little-mallet-wrapper

A Python wrapper around the topic modeling functions of MALLET.
GNU General Public License v3.0
100 stars 17 forks source link

Added all optional arguments and extraction of loss function values #8

Open rbbby opened 2 years ago

rbbby commented 2 years ago

I have updated the _train_topicmodel function to take any argument accepted by mallet as kwargs with the only difference being that arguments have '-' changed for '_', i.e. num-iterations --> _numiterations. Numeric values can be passed as either numeric or strings.

It is backwards compatible keeping all of the mandatory arguments. The only thing removed is the default value of --optimize-interval 10 from within the function. Instead it uses mallets default value of 0 and can be set manually by adding _optimizeinterval=10 as an argument. This was done in order to allow for the user to specify hyperparameter values themselves in case optimization is not wanted (for example by setting alpha=0.05).

The functionality to return loss function values gathered during training has also been added. If logperplexity=True, loss values will be scraped from the output and returned in a list (they are still printed as usual). This option is by default set to False.

The subprocess module is used to get loss values. I saw #2 and had the same issue on mac but managed to resolve it (no delay for printing output). I have not tested it on windows however (but think it should work?). If it does not work however, an option could be to use os by default unless logperplexity=True.

ninpnin commented 2 years ago

I tested this version on my Ubuntu machine and it worked fine.

maria-antoniak commented 2 years ago

Hello, thank you so much for adding to this project! 🙏

I think some of these changes are very useful but others might take some more thought/testing.