Closed rahulbot closed 2 years ago
we add that flag unless we got the date from an rss feed (or some similar structured syndication format) or is one of a small number of date guessing methods that we consider to be very reliable (such as a date stub in the url). for all other methods combined, our date guessing is around 80% accurate to within a day. the best defense for this is just for aashka to have a sense that if you see a big weird date spike, she should consider date guessing as the cause.
in theory, we could add some sort of monitor to look for date spikes within topics (they all come from topics because we only use date guessing for spidered stories). but I'm not sure what we would do once we found that spike other than warn the user. we're already doing the best we can at the date guessing, so there's nothing more than we can do without manual intervention.
-hal
On Tue, Sep 10, 2019 at 11:46 AM rahulbot notifications@github.com wrote:
WARNING: Harvard's email systems could not validate that the sender of this message is legitimate. Please be cautious in opening attachments, clicking any links, or following any other instructions in this email. [Error Code: SF]
Aashka is wondering if we can help give better clues for when dates can be trusted and when not. For instance, in her UN SDGs topic she saw a spike on a date, but it turned out to be just many stories incorrectly dated.
We use the date_is_reliable column to show story dates in the web interface in italics with a "?" after them. Should we include this Boolean variable in the download CSV? @hroberts https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hroberts&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=-3cdmUljwGsuMztYWk6ld5ICULkMBLs0OjinrLfXmEo&s=jsX44BLQXvR6-nZcGFOjw75vZcYrbLjK88_T_49HRQU&e= how is this attribute filed in? What does it mean?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mitmedialab_MediaCloud-2DWeb-2DTools_issues_1647-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAN66TYSLACUARKFSPDCFX3QI7FMTA5CNFSM4IVKDCY2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKQBYHQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=-3cdmUljwGsuMztYWk6ld5ICULkMBLs0OjinrLfXmEo&s=hwFnULjgFKfNE2Ca5Qak8XIsXKNdkKqqiwTTGCbOCTw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAN66TZXCKNZZTEVQBHD2TTQI7FMTANCNFSM4IVKDCYQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=0c5FW2CrwCh84ocLICzUHjcwKK-QMUDy4RRw_n18mMo&m=-3cdmUljwGsuMztYWk6ld5ICULkMBLs0OjinrLfXmEo&s=bOyxYrJOKx5r806HcsyM-f70KhJJhvbF2Lr7YOD8crc&e= .
Thanks, that's helpful. Will circle back with research folks here.
Aashka is wondering if we can help give better clues for when dates can be trusted and when not. For instance, in her UN SDGs topic she saw a spike on a date, but it turned out to be just many stories incorrectly dated.
We use the
date_is_reliable
column to show story dates in the web interface in italics with a "?" after them. Should we include this Boolean variable in the download CSV? @hroberts how is this attribute filed in? What does it mean?