thewca / worldcubeassociation.org

All of the code that runs on worldcubeassociation.org
https://www.worldcubeassociation.org/
GNU General Public License v3.0
320 stars 175 forks source link

Data cleanliness #400

Open jfly opened 8 years ago

jfly commented 8 years ago

Right now, many database records don't actually pass our validations. I added a rake db:data:validate script in https://github.com/cubing/worldcubeassociation.org/commit/168d45d50b7cbba6d33d84c2cd0e90fdbb4bf2d0 to check all our records to see if they're valid (as of February 22nd, 2016, there are 1612 invalid records in our database). Here's a set of unique validation errors in our database (as of February 22nd, 2016):

An exception occurred: invalid date
An exception occurred: The single-table inheritance mechanism failed to locate the subclass: 'report'. This error is raised because the column 'type' is reserved for storing the class in case of inheritance. Please rename this column if you didn't intend it to be used for storing the inheritance class or overwrite CompetitionsMedia.inheritance_column to use another column for that information.
#<Competition errors: ["Competition nickname is invalid"]>
#<Competition errors: ["Competition nickname is invalid", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Eventspecs invalid event ids: 333bts"]>
#<Competition errors: ["Eventspecs invalid event ids: 333bts,360"]>
#<Competition errors: ["Eventspecs invalid event ids: 333ts,333bts"]>
#<Competition errors: ["Eventspecs invalid event ids: 360"]>
#<Competition errors: ["Eventspecs invalid event ids: mirbl"]>
#<Competition errors: ["Eventspecs invalid event ids: un3sbf"]>
#<Competition errors: ["Name is invalid"]>
#<Competition errors: ["Name is invalid", "Competition nickname is invalid"]>
#<Competition errors: ["Name is invalid", "Competition nickname is invalid", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Name is invalid", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue address can't be blank"]>
#<Competition errors: ["Venue address can't be blank", "Eventspecs invalid event ids: snake,222oh"]>
#<Competition errors: ["Venue address can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue address can't be blank", "WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333r3,333bts"]>
#<Competition errors: ["Venue address can't be blank", "WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: snake"]>
#<Competition errors: ["Venue address can't be blank", "Website can't be blank"]>
#<Competition errors: ["Venue address can't be blank", "Website can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue can't be blank"]>
#<Competition errors: ["Venue can't be blank", "Venue address can't be blank"]>
#<Competition errors: ["Venue can't be blank", "Venue address can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue can't be blank", "Venue address can't be blank", "Website can't be blank"]>
#<Competition errors: ["Venue can't be blank", "Venue address can't be blank", "Website can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Venue can't be blank", "Website can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333bts"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333bts,333si"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333bts,360"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333ni"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333ni,333bts"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: 333ni,snake"]>
#<Competition errors: ["WCA delegate(s)  are not all delegates", "Eventspecs invalid event ids: mirbl"]>
#<Competition errors: ["Website can't be blank"]>
#<Competition errors: ["Website can't be blank", "WCA delegate(s)  are not all delegates"]>
#<Competition errors: ["Website is invalid"]>
#<Competition errors: ["Website is invalid", "Venue address can't be blank", "WCA delegate(s)  are not all delegates"]>
#<CompetitionOrganizer errors: ["Organizer can't be blank"]>
#<Poll errors: ["Deadline you can only change the deadline"]>
#<Post errors: ["Title has already been taken"]>
#<Registration errors: ["Competition Competition registration is closed"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: 333bf"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: 333bf,sq1,clock,333mbf,magic"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: 333fm"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: pyram"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: sq1"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: sq1,clock,magic"]>
#<Registration errors: ["Competition Competition registration is closed", "Events invalid event ids: sq1,magic"]>
#<Registration errors: ["Events invalid event ids: 222"]>
#<Registration errors: ["Events invalid event ids: 222,333ft"]>
#<Registration errors: ["Events invalid event ids: 333bf,333oh"]>
#<Registration errors: ["Events invalid event ids: 333fm"]>
#<Registration errors: ["Events invalid event ids: 333fm,333ft"]>
#<Registration errors: ["Events invalid event ids: 333fm,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333fm,444bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333fm,666,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333fm,666,777"]>
#<Registration errors: ["Events invalid event ids: 333fm,666,777,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333fm,666,777,444bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333fm,666,777,444bf,555bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: 333ft"]>
#<Registration errors: ["Events invalid event ids: 333mbf"]>
#<Registration errors: ["Events invalid event ids: 333mbf,mmagic"]>
#<Registration errors: ["Events invalid event ids: 333oh"]>
#<Registration errors: ["Events invalid event ids: 444bf"]>
#<Registration errors: ["Events invalid event ids: 444bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: 555bf"]>
#<Registration errors: ["Events invalid event ids: 666,777"]>
#<Registration errors: ["Events invalid event ids: 666,777,444bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: 666,777,444bf,555bf,333mbf"]>
#<Registration errors: ["Events invalid event ids: magic"]>
#<Registration errors: ["Events invalid event ids: mmagic"]>
#<Registration errors: ["Events must register for at least one event"]>
#<User errors: ["Birthdate must be in the past"]>
#<User errors: ["Full name can't be blank"]>
#<User errors: ["Unconfirmed WCA ID already assigned to a different user"]>
#<Vote errors: ["Poll poll is closed"]>

Also see #165.

jonatanklosko commented 8 years ago

Just experienced #<Registration errors: ["Events must register for at least one event"]> while using <registration>.update!(<so on>) in a migration. Moreover there is a registration whose competition is nil...

jfly commented 5 years ago

Discussed in email thread "Peculiar data in "events" tab for Toulon Open 2017": when we were launching the edit events page, we played around with it a bit for some competitions (there was no public view events page yet), and left some competitions in a state that does not reflect their final results. @SAuroux put together this query to identify mismatches:

SELECT competitionId, r_events, ce_events 
FROM (SELECT competitionId, count(distinct eventId) as r_events FROM Results GROUP BY competitionId) as r
INNER JOIN (SELECT competition_id, count(distinct event_id) as ce_events FROM rounds as ro INNER JOIN competition_events as ce on ce.id = ro.competition_event_id GROUP BY competition_id) as ce
ON r.competitionId = ce.competition_id
HAVING r_events <> ce_events

I think this query is correct, but "it only compares the number of events, and not the number of rounds of each event". @viroulep, could we leverage any of the results validation logic you've been working on to do a more complete check of the database?

viroulep commented 5 years ago

Yes at some point running the individual validators will make it easier to do a complete check of the database (to the best of our knowledge at least).