responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
172 stars 35 forks source link

News Digest is Down, no recent data #3057

Closed smcgregor closed 2 months ago

kepae commented 2 months ago

@pdcp1 @cesarvarela let's push to production and resolve this?

kepae commented 2 months ago

This is also exactly what we should be trying to monitor with live/smoke tests, separate from our unit testing.

pdcp1 commented 2 months ago

@pdcp1 @cesarvarela let's push to production and resolve this?

Yes, let's push the fix to Production

kepae commented 2 months ago

Still failing. Request is getting a 200 from the API, but no results.

Query:

query NewsArticles($query: CandidateQueryInput!) {
  candidates(query: $query, limit: 9999) {
    title
    url
    similarity
    matching_keywords
    matching_harm_keywords
    matching_entities
    date_published
    dismissed
    __typename
  }
}

Vars:

{
  "query": {
    "match": true,
    "date_published_in": [
      "2024-09-04",
      "2024-09-03",
      "2024-09-02",
      "2024-09-01",
      "2024-08-31",
      "2024-08-30",
      "2024-08-29",
      "2024-08-28",
      "2024-08-27",
      "2024-08-26",
      "2024-08-25",
      "2024-08-24",
      "2024-08-23",
      "2024-08-22"
    ]
  }
}

Response:

{
  "data": {
    "candidates": []
  }
}
pdcp1 commented 2 months ago

It seems that we are facing a data issue. The News Digest page retrieves news items from the last 14 days, but we don't have any candidate documents within that timeframe. The most recent document dates back to 2024-08-18.

If I extend the date range to 30 days on my local environment, I got results:

image
cesarvarela commented 2 months ago

If that's the case it might have to do with this: https://github.com/responsible-ai-collaborative/nlp-monitoring/actions

pdcp1 commented 2 months ago

Yes, the last run of nlp-monitoring process threw errors in the fetch_news.py step: https://github.com/responsible-ai-collaborative/nlp-monitoring/actions/runs/10714752060/job/29708973265

pdcp1 commented 2 months ago

And that makes sense because the last successful run of the nlp-monitoring was on 2024-08-18 https://github.com/responsible-ai-collaborative/nlp-monitoring/actions/runs/10438291066/job/28905425357

kepae commented 2 months ago

This is fixed. The upstream NLTK package adjusted how they required imported data to be handled. https://github.com/responsible-ai-collaborative/nlp-monitoring/pull/6