jonjohnsontc / songwriter-graph

A recommender system designed to compare how songwriters write in relation to one another
3 stars 0 forks source link

Cleaning + Matching Composition Records to Song Records #5

Open jonjohnsontc opened 5 years ago

jonjohnsontc commented 5 years ago

I have roughly 260k composition records from ASCAP that will need to be matched against the current recordings indexed within the recommender.

In order to do so, I believe the composition records will first need to be cleaned and stored across a couple of PostgreSQL tables. Any subsequent song credits retrieved will need to be cleaned and stored as well.

Afterwards, the fun of actually matching these records begins. My best guess to do this involves matching song_title and artist fields from both tables. I'm assuming there will be a number of records that may need to be manually matched, which hopefully won't amount to more than 5% of the current track set (~1100).

jonjohnsontc commented 5 years ago

I've since been able to match roughly 150k compositions with recordings, allowing me to recommend on roughly 10k songwriters. We're currently using very simple regex cleaning techniques to accomplish much of the matching between tracks and compositions (removing special characters, etc). However, more suggestions are welcome in determining how to best match tracks and compositions