Statistics over found issues

osm-search / Nominatim-Data-Analyser

QA Tool for Nominatim. Helps to improve the OpenStreetMap data quality and therefore the Nominatim search results.

GNU General Public License v2.0

10 stars 3 forks source link

Statistics over found issues #25

Open lonvia opened 2 years ago

lonvia commented 2 years ago

The analyser should collect some statistics over the number of errors it finds in each run. In the long run this should be displayed in the front end but for now it would just be useful to have the data on the server. I'd like to be able to compare runs before and after changes to the Nominatim code.

My suggestion would be to simply log the data in a table in the database. That makes it easy to generate summaries as required. A simple table would do with columns for date, name of the QA check and number of errors. Maybe add an extra_data column in JSONB, so we are future proof against any additional data we might want to save in the future.

lonvia commented 2 years ago

Quick workaround for now: grep 'Query.*returned.*results' on the output log files.

AntoJvlt commented 2 years ago

Great, this is exactly what was my plan for the next development step, I was planning to open an issue here later to talk about it.

I think it will be pretty straightforward as we have thought about the same way of doing this. In addition we should create an API for this tool so that we can easily query the statistics. For example we could have an endpoint for fetching the statistics of each week during the last 2 months for a specific rule. This should be pretty easy if we store a new statistics entry for each rule on each run of the tool.

The API will also be needed for the future false positive reports that might be made from the front end.

lonvia commented 2 years ago

Tiny obstacle: the data analyser currently runs on a different server (on stormfly) than the QA display (on nominatim.org). I wouldn't want to run an active API on stormfly (currently it simply serves static tiles from filesystem, nothing more). Any statistics API would have to be on nominatim.org.

So maybe instead of saving the data in the Postgresql database, use a separate SQLite DB which we can then pulish as a static file on stormfly again. Then nominatim.org can download the file and use it for any API implementation.

AntoJvlt commented 2 years ago

Is it really not possible to expose the data from an API on the server generating the data? This would be much more easier and less complex.

Otherwise are you sure SQLite DB is a good solution for it? It seems good to handle an app local lite database, I am not sure about how we could handle further additional features like false positive and so on.