Closed smcgregor closed 2 months ago
This is also exactly what we should be trying to monitor with live/smoke tests, separate from our unit testing.
@pdcp1 @cesarvarela let's push to production and resolve this?
Yes, let's push the fix to Production
Still failing. Request is getting a 200 from the API, but no results.
Query:
query NewsArticles($query: CandidateQueryInput!) {
candidates(query: $query, limit: 9999) {
title
url
similarity
matching_keywords
matching_harm_keywords
matching_entities
date_published
dismissed
__typename
}
}
Vars:
{
"query": {
"match": true,
"date_published_in": [
"2024-09-04",
"2024-09-03",
"2024-09-02",
"2024-09-01",
"2024-08-31",
"2024-08-30",
"2024-08-29",
"2024-08-28",
"2024-08-27",
"2024-08-26",
"2024-08-25",
"2024-08-24",
"2024-08-23",
"2024-08-22"
]
}
}
Response:
{
"data": {
"candidates": []
}
}
It seems that we are facing a data issue.
The News Digest page retrieves news items from the last 14 days, but we don't have any candidate documents within that timeframe.
The most recent document dates back to 2024-08-18
.
If I extend the date range to 30 days on my local environment, I got results:
If that's the case it might have to do with this: https://github.com/responsible-ai-collaborative/nlp-monitoring/actions
Yes, the last run of nlp-monitoring
process threw errors in the fetch_news.py
step:
https://github.com/responsible-ai-collaborative/nlp-monitoring/actions/runs/10714752060/job/29708973265
And that makes sense because the last successful run of the nlp-monitoring
was on 2024-08-18
https://github.com/responsible-ai-collaborative/nlp-monitoring/actions/runs/10438291066/job/28905425357
This is fixed. The upstream NLTK package adjusted how they required imported data to be handled. https://github.com/responsible-ai-collaborative/nlp-monitoring/pull/6
@pdcp1 @cesarvarela let's push to production and resolve this?
3051