jefmsmit / gdshowsdb

All Grateful Dead show information in a relational database.
MIT License
67 stars 18 forks source link

Several years have incorrect "segued" data #111

Open tbkblues opened 3 years ago

tbkblues commented 3 years ago

Several years have incorrect "segued" data. For example, 1980 has no ":segued: true" data. There are many Scarlet/Fires that year (see, for example, 11/30/1980): :songs:

...and many other segued songs which are marked as "false" which should be "true".

jefmsmit commented 3 years ago

@tbkblues thanks for the issue. Seems like you large undertaking to correct all incorrect segued data. Care to make a pull request for the ones you see as wrong? Perhaps broken up by year and well known song suite?

tbkblues commented 3 years ago

I can write a program to modify the well-known ones (read: Scarlet/Fire, China/Rider), then audit the changes since there were occasionally some Scarlet/Touch/Fires and early Chinas without Rider as well as some early Riders without China, and submit a PR. But, as you say, it's a large undertaking so I might do it over a period of time. I assume I'll create a branch so it can be merged properly?

jefmsmit commented 3 years ago

I think it may be easier than that. If you load this data into a database (its intended use), I'm sure you could cook up a SQL query and fine all Scarlet Begonias without a segue of true

jefmsmit commented 3 years ago

Basically this: select songs.* from songs, song_refs where songs.song_ref_uuid = song_refs.uuid and song_refs.name = 'Scarlet Begonias' and segued = false;

jefmsmit commented 3 years ago

Better yet:

select shows.uuid, shows.year, shows.month, shows.day 
from shows, show_sets, songs, song_refs 
where
  shows.uuid = show_sets.show_uuid and
  songs.show_set_uuid = show_sets.uuid and
  songs.song_ref_uuid = song_refs.uuid and 
  song_refs.name = 'Scarlet Begonias' 
  and segued = false;
tbkblues commented 3 years ago

Yes--I can fix just my version since I already have a re-runable program to read the YAML files and load them to a MySQL database, but to fix them for everyone, I'd need to edit the YAML files and create a PR here. I don't have a way (yet) to read the database and create YAML files, though I could create such a thing.

jefmsmit commented 3 years ago

The query above will basically just give you a list of shows to go edit. That may be easier that trying to update the database and then round trip it back into yaml.

jefmsmit commented 3 years ago

@tbkblues I am going through every show that has a Scarlet Begonias with segue of false and checking the segues in general. This may take a few days. Then I can repeat the process with others.

tbkblues commented 3 years ago

Wow—thank you! I had planned do some of this as well, so if you want to leave a few years undone I can do those years.

jefmsmit commented 3 years ago

Take a look at: #124

tbkblues commented 3 years ago

The segues look great--though I haven't verified them all. On what may be a different note, my import is blowing up on the second show entry for 08/21/1968, which has no data in the YAML file. According to Deadbase II (which has all the show details through 1987), there was only one show on 08/21/1968, which you have in your data. I thought the data had a number after the date if there was more than one show on a given date. For example 1/13/67 had two shows: "1967/01/13/0:" for the first show, "1967/01/13/1:" for the second show. My import is reading those extra numbers and inserting the data to the database accordingly. Also, what are you using for your source info? Deadbase II could be wrong. Thanks again for all the work here!

tbkblues commented 3 years ago

A few other issues I found in the YAML data:

jefmsmit commented 3 years ago

The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.

jefmsmit commented 3 years ago

@tbkblues, over the weekend I merged into master all of the correct segue data for any show with Scarlet Begonias in it that didn't have a segue. I changed a lot of stuff. There is still a lot to do. Drums for example is exposing about 1000 shows where the segue data is suspect.

tbkblues commented 3 years ago

The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.

I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?

jacobcharleskillian commented 3 years ago

STAR Schema might be appropriate for this application.

On Thu, Mar 4, 2021 at 4:11 PM tbkblues notifications@github.com wrote:

The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.

I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jefmsmit/gdshowsdb/issues/111#issuecomment-790980921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJFIBKG2NVTPVFC2S5FKGV3TCAAQ5ANCNFSM4WH4BQVQ .

jefmsmit commented 3 years ago

The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.

I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?

The songs table has a foreign key back to the song_refs table as defined here: https://github.com/jefmsmit/gdshowsdb/blob/master/lib/gdshowsdb/db/migrations/005_create_song_table.rb#L7

tbkblues commented 3 years ago

The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.

I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?

The songs table has a foreign key back to the song_refs table as defined here: https://github.com/jefmsmit/gdshowsdb/blob/master/lib/gdshowsdb/db/migrations/005_create_song_table.rb#L7

Is there a songs.yaml that I'm not seeing? I just see the data by year, song_refs.yaml and shows-debug.yaml in https://github.com/jefmsmit/gdshowsdb/tree/master/data/gdshowsdb. The UUID in each :songs: section in each year's data file does not match the particular song's UUID in the song_refs.yaml. If the UUID in each year's :songs: section is an instance of the song, the only way to lookup a song is by the song title. I made my data relational based on the UUID in the song_refs.yaml file.

jefmsmit commented 3 years ago

The YAML does not correlate directly with the database structure. The YAML is set up to be as human readable as possible. The database is set up to be as relational and normalized as possible. If you want to run a SQL query that returns the song name you need to join the songs song_ref_uuid column to the song_refs uuid column.

tbkblues commented 3 years ago

I see. I'm using the *.yaml files as my source data to load into a MySQL database. Do you have the data stored somewhere else?

jefmsmit commented 3 years ago

How are you loading the yaml files? The Gem has the ability to populate a relational database for you. Out of the box the code should allow you to use SQLite. Take a look at this sample program that inits and loads the database. Then it uses the models provided in the Gem to work with the data.

https://github.com/jefmsmit/gdshowsdb/blob/master/sample.rb

tbkblues commented 3 years ago

Sorry, I'm not good with Ruby. I'm doing it in C# using a YAML parser and doing my own database work. Is the raw data contained in the one of the .rb files?

jefmsmit commented 3 years ago

The YAML files are the raw data. The Ruby code creates the database schema and loads the data into that schema. It may be worth your while to do just enough ruby to get the database loaded so you don't have to rewrite that. But if that doesn't work for you then the YAML is the raw data.

tbkblues commented 3 years ago

Sounds good. Thanks!