Open tbkblues opened 3 years ago
@tbkblues thanks for the issue. Seems like you large undertaking to correct all incorrect segued data. Care to make a pull request for the ones you see as wrong? Perhaps broken up by year and well known song suite?
I can write a program to modify the well-known ones (read: Scarlet/Fire, China/Rider), then audit the changes since there were occasionally some Scarlet/Touch/Fires and early Chinas without Rider as well as some early Riders without China, and submit a PR. But, as you say, it's a large undertaking so I might do it over a period of time. I assume I'll create a branch so it can be merged properly?
I think it may be easier than that. If you load this data into a database (its intended use), I'm sure you could cook up a SQL query and fine all Scarlet Begonias without a segue of true
Basically this:
select songs.* from songs, song_refs where songs.song_ref_uuid = song_refs.uuid and song_refs.name = 'Scarlet Begonias' and segued = false;
Better yet:
select shows.uuid, shows.year, shows.month, shows.day
from shows, show_sets, songs, song_refs
where
shows.uuid = show_sets.show_uuid and
songs.show_set_uuid = show_sets.uuid and
songs.song_ref_uuid = song_refs.uuid and
song_refs.name = 'Scarlet Begonias'
and segued = false;
Yes--I can fix just my version since I already have a re-runable program to read the YAML files and load them to a MySQL database, but to fix them for everyone, I'd need to edit the YAML files and create a PR here. I don't have a way (yet) to read the database and create YAML files, though I could create such a thing.
The query above will basically just give you a list of shows to go edit. That may be easier that trying to update the database and then round trip it back into yaml.
@tbkblues I am going through every show that has a Scarlet Begonias
with segue of false
and checking the segues in general. This may take a few days. Then I can repeat the process with others.
Wow—thank you! I had planned do some of this as well, so if you want to leave a few years undone I can do those years.
Take a look at: #124
The segues look great--though I haven't verified them all. On what may be a different note, my import is blowing up on the second show entry for 08/21/1968, which has no data in the YAML file. According to Deadbase II (which has all the show details through 1987), there was only one show on 08/21/1968, which you have in your data. I thought the data had a number after the date if there was more than one show on a given date. For example 1/13/67 had two shows: "1967/01/13/0:" for the first show, "1967/01/13/1:" for the second show. My import is reading those extra numbers and inserting the data to the database accordingly. Also, what are you using for your source info? Deadbase II could be wrong. Thanks again for all the work here!
A few other issues I found in the YAML data:
The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.
@tbkblues, over the weekend I merged into master all of the correct segue data for any show with Scarlet Begonias in it that didn't have a segue. I changed a lot of stuff. There is still a lot to do. Drums for example is exposing about 1000 shows where the segue data is suspect.
The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.
I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?
STAR Schema might be appropriate for this application.
On Thu, Mar 4, 2021 at 4:11 PM tbkblues notifications@github.com wrote:
The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.
I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jefmsmit/gdshowsdb/issues/111#issuecomment-790980921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJFIBKG2NVTPVFC2S5FKGV3TCAAQ5ANCNFSM4WH4BQVQ .
The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.
I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?
The songs table has a foreign key back to the song_refs table as defined here: https://github.com/jefmsmit/gdshowsdb/blob/master/lib/gdshowsdb/db/migrations/005_create_song_table.rb#L7
The uuid values for songs are for specific occurrences of those songs. The table then has a reference and foreign key back to the song_refs table. It is truly relational data. In the YAML I use the song_ref name for some amount of readability.
I understand that the UUIDs associated with each song in the transactional data is a unique instance of that song, but I'm not sure where the reference table is. I see the "songs.refs.yaml" which seems to be a distinct list of all the songs, but that UUID in that file is no where else to be found in the data. Am I missing it somewhere?
The songs table has a foreign key back to the song_refs table as defined here: https://github.com/jefmsmit/gdshowsdb/blob/master/lib/gdshowsdb/db/migrations/005_create_song_table.rb#L7
Is there a songs.yaml that I'm not seeing? I just see the data by year, song_refs.yaml and shows-debug.yaml in https://github.com/jefmsmit/gdshowsdb/tree/master/data/gdshowsdb. The UUID in each :songs: section in each year's data file does not match the particular song's UUID in the song_refs.yaml. If the UUID in each year's :songs: section is an instance of the song, the only way to lookup a song is by the song title. I made my data relational based on the UUID in the song_refs.yaml file.
The YAML does not correlate directly with the database structure. The YAML is set up to be as human readable as possible. The database is set up to be as relational and normalized as possible. If you want to run a SQL query that returns the song name you need to join the songs
song_ref_uuid
column to the song_refs
uuid
column.
I see. I'm using the *.yaml files as my source data to load into a MySQL database. Do you have the data stored somewhere else?
How are you loading the yaml files? The Gem has the ability to populate a relational database for you. Out of the box the code should allow you to use SQLite. Take a look at this sample program that inits and loads the database. Then it uses the models provided in the Gem to work with the data.
Sorry, I'm not good with Ruby. I'm doing it in C# using a YAML parser and doing my own database work. Is the raw data contained in the one of the .rb files?
The YAML files are the raw data. The Ruby code creates the database schema and loads the data into that schema. It may be worth your while to do just enough ruby to get the database loaded so you don't have to rewrite that. But if that doesn't work for you then the YAML is the raw data.
Sounds good. Thanks!
Several years have incorrect "segued" data. For example, 1980 has no ":segued: true" data. There are many Scarlet/Fires that year (see, for example, 11/30/1980): :songs:
...and many other segued songs which are marked as "false" which should be "true".