hmc-whisk / jsLDA

A React based version of jsLDA with brand new features added on
Other
0 stars 0 forks source link

Refactor LDAModel.js #172

Open theobayard opened 3 years ago

theobayard commented 3 years ago

Refactoring LDAModel

LDAModel.js over 1200 lines long and continues to grow. It's got tons of methods and every method has tons of side effects. For me at least, this makes it really hard to reason about what it going on inside of it. I think it would benefit hugely from some refactoring. After a year or so of trying to figure out how best to refactor LDAModel, I have some thoughts on what the new LDAModel might look like.

New class structure

My main observation is that LDAModel already has a data structure that it maintains, but doesn't properly utilize. So, if we created more classes to fill out this data structure, we would end up with a structure that is easier to reason about, build on, test, and maintain. I have created the following diagram to broadly show how I would lay these classes out. LDAModel_Refactor

It's not quite UML (my editing software didn't have the correct symbols) but each of the arrows imply a has a/many relationship. Note that when I say list in this diagram I do not mean wordList = []. I mean a class WordList {}. This structure means that you have to think carefully about where functions go, but it also avoids and circular dependencies and lets you do anything you might need to do as far as I can tell (P.S. From what I've heard, Node.js lets you include circular dependencies, but unless you're very careful it will probably break stuff and/or slow your code down). You will probably need to modify this structure as you implement/plan functions, but I would suggest continuing to avoid circular dependencies.

Here are some specific examples of how the structure above would work.

Other thoughts

theobayard commented 3 years ago

Thinking about it some more, I think it would be good to split the LDAModel class in the diagram above into LDAModelCore (where the model is trained, reset, etc) and LDAModelAnalysis (Where questions about the model are answered). I think this would be good A: because it's another way to break the model up and it needs all the breaking up it can get and B: because it makes it very clear that analysis happens in LDAModelAnalysis not downstream in some React component. As far as how this would fit in the diagram above, I would fit it in by renaming LDAModel -> LDAModelCore and then adding a new class LDAModelAnalysis and adding a has a relationship LDAModelAnalysis -> LDAModelCore. This same idea could also be expanded to all of the single instance classes.