mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

assess top terms #65

Closed rahulbot closed 2 months ago

rahulbot commented 4 months ago

We need to understand how this is working. As a first pass, this includes 3 tasks

  1. verify we are seeing top terms in headlines
  2. understand if/how it is sampling across ES shards
  3. if we can change sample size, try out diff sizes to find a sample size that produces fast but relatively stable results
rahulbot commented 2 months ago

@pgulley what's the status of the blog post summarizing results?

pgulley commented 2 months ago

I think it's done! I've had a few revisions now- probably ready to post? https://docs.google.com/document/d/1nN3L0aajPReBNKzfIy-qKIEX-2qH-cTs8k8G8LxBpw8/edit?usp=sharing

rahulbot commented 2 months ago

Great. Perhaps let's get the NSF announcement post out ASAP, and then schedule this to go out in a week or two? Since the assessment research is done I'll close this issue.