scribe-org / Scribe-Server

Backend service for Scribe data downloads
GNU General Public License v3.0
2 stars 5 forks source link

Add process to check for major drops in data between updates #16

Open andrewtavis opened 3 months ago

andrewtavis commented 3 months ago

Terms

Description

Based on https://github.com/scribe-org/Scribe-Data/issues/68, we need to keep in mind that there will be cases that a property on Wikidata will change such that there will be a large drop in data. In the referenced issue, Portuguese verbs are using a non-standard past perfect PID that could be combined with the more widely used one at some point.

This issue would look into ways of diffing the current data coverage against the new data coming in, which could be as simple as total keys and total non-null values of keys of sub-objects. We could then discuss a viable cutoff, and trigger some kind of warning or a Scribe-Data issue if it's too low 😊

Contribution

Would be happy to discuss! Could also help implement, but might be better if others get to this eventually as I'm a long way off on Go :)

andrewtavis commented 3 months ago

Even just an email or a Matrix bot with a summary of the changes with some coverage metrics would be great 😊

daveads commented 3 months ago

cool i will take a look at this... @andrewtavis

wkyoshida commented 2 months ago

Hey @daveads - thank you for the interest in this issue! Just a quick FYI though that this might be a little ways away, since this is likely dependent on having the CLI for Scribe-Data polished up (i.e. the ongoing GSoC project) and then likely at least v1 of Scribe-Server implemented as well. But once we get there, we can definitely have you take this on!

daveads commented 2 months ago

Hey @daveads - thank you for the interest in this issue! Just a quick FYI though that this might be a little ways away, since this is likely dependent on having the CLI for Scribe-Data polished up (i.e. the ongoing GSoC project) and then likely at least v1 of Scribe-Server implemented as well. But once we get there, we can definitely have you take this on!

oh okay