uchicago-computation-workshop / 2018_spring_conference

0 stars 1 forks source link

Alice Chung #14

Open bensoltoff opened 6 years ago

rickecon commented 6 years ago

Good job on stating your research question clearly at the beginning.

rickecon commented 6 years ago

Very smooth presentation, even with the computer turning off in the middle.

jamesallenevans commented 6 years ago

FDIC enforcement decisions and orders Topic modeled Looking for optimal topics I'm not exactly sure what it is that you are trying to get out of these topics...do you want to do inverse regression to predict something directly with the topic compositions themselves?

jmausolf commented 6 years ago

If you are still having problems converting pdf's to text, the following code worked well on Congressional Record pdfs, and I believe it automatically tries OCR:

https://github.com/jmausolf/Congressional_Record/blob/master/__Convert_pdf_to_txt.sh

shugamoe commented 6 years ago

how you gon decide how many topics to choose in the LDA model thing

bensoltoff commented 6 years ago

@shugamoe perplexity scores? That's the downside to LDA, I'm not aware of a great method for deciding how many topics is enough

Alicechung commented 6 years ago

@shugamoe @bensoltoff I was thinking of using perplexity scores as well and it is quite common and recommended when using LDA.