Open bensoltoff opened 6 years ago
Very smooth presentation, even with the computer turning off in the middle.
FDIC enforcement decisions and orders Topic modeled Looking for optimal topics I'm not exactly sure what it is that you are trying to get out of these topics...do you want to do inverse regression to predict something directly with the topic compositions themselves?
If you are still having problems converting pdf's to text, the following code worked well on Congressional Record pdfs, and I believe it automatically tries OCR:
https://github.com/jmausolf/Congressional_Record/blob/master/__Convert_pdf_to_txt.sh
how you gon decide how many topics to choose in the LDA model thing
@shugamoe perplexity scores? That's the downside to LDA, I'm not aware of a great method for deciding how many topics is enough
@shugamoe @bensoltoff I was thinking of using perplexity scores as well and it is quite common and recommended when using LDA.
Good job on stating your research question clearly at the beginning.