Review Ticket: Evaluating Topic Models

drjwbaker commented 5 years ago

The Programming Historian has received the following proposal for a lesson on 'Evaluating Topic Models' by @akhenrichs. This lesson is now under review and can be read at: http://programminghistorian.github.io/ph-submissions/lessons/evaluating-topic-models

Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.

@spapastamkou and myself will act as editors for the review process. Our role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. We have already read through the lesson and provided feedback, to which the author has responded.

Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.

I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email @drjwbaker or @spapastamkou If there are any concerns from the authors they can contact the Ombudsperson @amandavisconti.

Anti-Harassment Policy

This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.

The Programming Historian is dedicated to providing an open scholarly environment that offers community participants the freedom to thoroughly scrutinize ideas, to ask questions, make suggestions, or to requests for clarification, but also provides a harassment-free space for all contributors to the project, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, age or religion, or technical experience. We do not tolerate harassment or ad hominem attacks of community participants in any form. Participants violating these rules may be expelled from the community at the discretion of the editorial board. If anyone witnesses or feels they have been the victim of the above described activity, please contact our Ombudsperson (@amandavisconti). Thank you for helping us to create a safe space.

drjwbaker commented 5 years ago

Contacted reviewers to state that lesson is ready for review. Deadline given of 15 May 2019.

drjwbaker commented 5 years ago

Unfortunately there has been an unexpected delay with the peer review for this article. I will keep you updated as I know more.

akhenrichs commented 5 years ago

Thanks @drjwbaker !

sdedeo commented 5 years ago

I am usually a soft touch with review reports, and am keen to see connections being built between fields. Unfortunately, this piece has basic misunderstandings that would mislead the intended audience and confuse those who have some experience and have come to Programming Historian to learn more. The article is sufficiently incorrect that I would urge a graduate student or new researcher not to read it as stands. It is a great idea, but represents a missed opportunity to communicate clearly and effectively across the computational and philosophical divides in our fields.

I believe these flaws are sufficiently serious that the article should not be published. It is also difficult to see how the article could be put into proper form without a complete re-write; i.e., I do not believe a second round of review would be of use. I thus strongly urge a "reject" decision on this article, with encouragement and respect to the author in their endeavours.

Again, I am sorry that I could not be more positive in this report. This is my first time doing a "public" peer review; I would rather have preferred to provide this report privately so as to help the author without having to be negative in public. But of course I understand and am happy to experiment with the new form.

The problems with this article:

The author confuses "word clouds" (i.e., a visual representation of word frequencies) with the commonly used alogrithm known as Latent Dirichlet Allocation (LDA) or more commonly known as topic modeling. These two things have nothing to do with each other. This confusion repeats itself throughout the text.
The author makes a distinction between an algorithm "based in word counts" and one based in probability. Both word clouds and topic models must (out of necessity) work with word counts; both methods end up inferring probabilities from the data.
The author provides the exceedingly bad advice to infer what produced a data set by looking at the file format, instead of reading the documentation or contacting the author. I would urge precisely the opposite with any collaborator or graduate student.
The author confuses an algorithm (LDA) and an implementation of the algorithm (MALLET).
The author asserts that "special characters must be stripped out of the text, including numbers and punctuation" for topic modeling and indeed "any computational procedure"; there is no requirement that this be done at all. "Special characters" is also a strangely English-language centric term; é is not a special character for those whose first language is French, for example.
The author provides a tendentious account of what topic modeling does, in particular that it "represent[s] authorial style as a series of choices that conform to personal preference". Authorial style (at least as it is used in most humanities contexts) is only of the things a topic model might detect, and very often a weak signal of style is overwhelmed by other phenomena.
The author makes the assertion that "each topic is most about the first word". Depending on how the lexicon was cleaned, this may not be true at all; for example, many topics might share the same top words, which could very well be rather meaningless on their own. There are a number of different ways that people have considered understanding topics, including looking for distinctive words by TFIDF or KL divergence and a reverse-lookup procedure that locates the documents most weighted on the topic.
The author claims that the most common topic in a corpus is what the corpus is most "about"; quite apart from the vague nature of that claim, it is also very often untrue. A topic model may find a collection of very closely related topics that together constitute the main theme of the corpus, while the most common topic is a "miscellaneous" one that contains glue words, etc.
The author makes claims about how to "identify input", and what this means. None of the claims the author make are in any way sourced. They appear to be opinions. In turn: (A) "prose after 1850" is easiest because "since those were the inputs which trained the algorithm in the first place" -- this sentence doesn't make sense; what "inputs" does the author mean here?) (B) poetry after 1850 doesn't have many spelling or usage variations compared to prose (very unlikely, and again a strange claim about "the algorithm's training"); (C) the apparent claim that custom stopword lists are only used for pre 1850 texts.
Corpus language discussion is English-centric, and covers none of the usual questions (inflected languages, stemming, lemmatizing, etc)
The very poor recommendation to use the outputs of a topic model where one is unclear on the corpus or the methods that produced it.
"Identify the research question" section seems strange. There doesn't seem to be any content here other than the idea that it is a good idea to figure out what the person's research question was -- is this ever not the case?
The author asks the reader to keep in mind the question "How big is the corpus? (Is it statistically significant?)" There is no such thing as a "statistically significant" corpus, and it is unclear what they mean here. Perhaps they mean "a corpus sufficiently large that a particular assertion could be made at high statistical significance", but this is a combination of the assertion, the test and null model used, and the corpus size, and there is little that can be said in general.
Basic questions that a researcher should ask about topic models are missing (e.g., the relationship between the number of words total, the number of documents, the number of words per document) and concerns that might result.
Throughout the text, a failure to engage with much excellent work on how to interpret topic models (e.g., many things by David Mimno and collaborators on close reading vs topic modeling, corpus cleaning, etc; Jaimie Murdock's work on Darwin, Alexander Barron's on the French Revolution corpus, Jo Guldi's work on Paper Machines and beyond, a growing amount of work in traditional humanities journals, etc). There's so much out there, so much interesting stuff, so many good ideas. I believe the author would learn a great deal from looking into this work.

Again, I feel bad that I can not be positive about this submission. I provide this report with respect, and with sincere best wishes to the author. Please contact me privately or here on GitHub if I can be of more help.

Very sincerely,

Simon DeDeo Carnegie Mellon University & the Santa Fe Institute

drjwbaker commented 5 years ago

Thanks @sdedeo for the review.

drjwbaker commented 5 years ago

The second review (offline right now) is in hand and @akhenrichs is aware. In line with our editorial guidelines I will aim to summarise the reviews shortly, hopefully by the end of the week (though it is is marking season for me, so apologies if there is a short delay).

Two things to note:

My role now is:

to summarise the suggestions and give the author a clear path for any revisions that [I] would like them to respond to

Once a lesson proposal has been accepted, we at the Programming Historian:

work closely with the author and reviewers to maximize its potential and publish it within a reasonable amount of time.

drjwbaker commented 5 years ago

There is a good idea here. And we stated last year that we wanted a lesson on this topic. However, it is clear that substantial work is needed to make this a great lesson.

Both reviewers provided extensive commentary. In light of this, I won't respond at this stage to all their comments. Rather, I want to focus on the substantive points that need to be addressed in order for this lesson to proceed.

Framing. The lesson should be clearly reframed to support historians who want to understand a topic model. This could either be one they encounter (say in a publication) or one they produce, but not both. @akhenrichs: is this the lesson you want to write?
Word Clouds. Both reviews are unclear why word clouds are used in a lesson on evaluating topic models and recommend they are removed. @akhenrichs: do you have a justification for using word clouds in this context?
Training

"If prose written after about 1850, the algorithm will have the easiest time dealing with the corpus, since those were the inputs which trained the algorithm in the first place."

The reviews take issue with the idea that topic models are trained and that they are trained on post-1850 text. @akhenrichs: is this what you meant or are different meanings of the word 'trained' causing confusion here?

Counts vs probability. @sdedeo writes:

The author makes a distinction between an algorithm "based in word counts" and one based in probability. Both word clouds and topic models must (out of necessity) work with word counts; both methods end up inferring probabilities from the data

@akhenrichs: perhaps it would work better to quote from an existing publication that descibes the relationship between topic models, data, and probabilistic reasoning?

Special Characters. @sdedeo writes:

The author asserts that "special characters must be stripped out of the text, including numbers and punctuation" for topic modeling and indeed "any computational procedure"; there is no requirement that this be done at all. "Special characters" is also a strangely English-language centric term; é is not a special character for those whose first language is French, for example.

@akhenrichs: the first point here should be reframed along the lines of why it might be useful to remove stopwords. But the second point is more substantive as it is at odds with our advise on writing for a global audience.

What the topic is about. You write 'each topic is most about the first word'. @sdedeo notes that if topics share top words then this is not true. You then say 'Yet tables representing topic models typically come in multiples that need to be studied in conjunction; this table is not the whole picture'. I suspect that this section would work better as a list of things for a historian to look at when evaluating a topic model: consider the top word, consider other words, compare topics, et cetera. @akhenrichs: does this fit with how you work with topic models and with what you want to say about evaluating topic models?
Corpus size. It strikes me that you want to say something about how the size of a corpus should be taken into account when analysing a topic model, but that the language you use (e.g. "Is it statistically significant?") is imprecise. @akhenrichs: do you have examples you could work outward from the give the reader a sense of what you are trying to get at here?

At this stage we typically ask that revisions are completed in 4 weeks. Given the recommendations of the reviewers, I suggest we take a different route on this occasion. @akhenrichs: by no later than 27 June 2019 could you please:

1) Indicate whether or not you are willing to continue working on the lesson. When making your decision, please keep in mind our commitment to "work closely with the author and reviewers to maximize its potential and publish it within a reasonable amount of time". 2) If yes to 1), respond to the 7 points I have raised above and how you plan to revise the lesson. 3) Respond to any other aspects of the peer review that you'd like to address directly, because - if yes to 1) - I will return to these later.

akhenrichs commented 5 years ago

I'd like to thank the reviewers for their substantive comments, and to @drjwbaker for his summary. I appreciate the time the reviewers spent on this lesson, but unfortunately I have other writing commitments this summer and am not able to undertake the suggested revisions to the lesson at this point in time. If possible, I hope to return to it at some point in the future.

drjwbaker commented 5 years ago

Thanks for your honesty @akhenrichs. I will make a note to contact you 6 September to ask if you want to return to the lesson, at which point I will close the ticket if no progress is possible (though we can always return to it at a later date). Thank you for your efforts.

drjwbaker commented 5 years ago

@akhenrichs Just checking in as promised to see if you wish to return to this or not. If I don't hear from you by 20 September, I will close the ticket.

akhenrichs commented 4 years ago

@drjwbaker Thank you for checking in; unfortunately, I will not be able to return to this for at least the next year. If after that point there is still a need for this tutorial, I would happily take it up again.

drjwbaker commented 4 years ago

@akhenrichs Okay, thanks for the update. Closing this issue for now. If you want to come back the lesson next year, get in touch.

programminghistorian / ph-submissions

Review Ticket: Evaluating Topic Models #234

Anti-Harassment Policy