tenders-exposed / elvis-ember

**el.vis** - a tool for visualising public (EU) tenders big data
https://talk.tenders.exposed
GNU Affero General Public License v3.0
8 stars 2 forks source link

handle obvious(?) typos in source data #464

Open hanecak opened 5 years ago

hanecak commented 5 years ago

I'm looking at a "record contact" of a "top company" in Slovakia:

https://tenders.exposed/network/1b535cfa-7220-4545-a581-d24217d7e0c6/bidders/343fc122-fa68-4ff0-9b62-1b48e14aa0cc/813f913a-4bb4-49a4-bd51-e64b1f9fa613

Amount is over one BILLION which is obviously not a case since for example "second best" Slovak supplier has a total of 367 MILLION.

Sadly, error is in the source data, see screenshot from ted.europa.eu:

screenshot from 2018-10-17 18-05-37

I know, this one is hard. It would be nice to detect such obvious outliers, not take them into account and raise that with "upstream" (TED? contracting authority?)

screenshot from 2018-10-17 18-05-34

Side note: Once correction is made "upstream", #426 will come handy too.

hanecak commented 5 years ago

Side note: You can check-up Slovak companies (at least by hand, basic checks) on FinStat.sk . I.e. company in question has roughly 700-800k€ turnover yearly, see https://finstat.sk/35681811 . That data is from Tax Office, so I guess they were not able to hide that 1 billion. :)

nightsh commented 5 years ago

Would it be useful to generate some relevant links to ease the hand checking tasks a bit? Also, to (hopefully) push a little towards fact checking :)

zufanka commented 5 years ago

Yea, it's a tough one this one. We are using data from opentender.eu. Tender data is notoriously dirty. But maybe I can manually check the really BIG amounts in the data, like 1B and more, and tell the opentender people to fix it in their data too.

hanecak commented 5 years ago

@nightsh Links so far are IMHO satisfactory ("view on ted.europa.eu" and "view on opentender.eu").

The only addition might be to direct link to look-up the company on FinStat.sk (via ICO) to quickly check the volume of contracts with their actual turnover. But that would be relevant only for companies registered in Slovakia.

So yes, for now I understand that "flagging" and "escalation" (as suggested by @zufanka ) are main sensible approach.