Open jonjohnsontc opened 6 years ago
I've since been able to match roughly 150k compositions with recordings, allowing me to recommend on roughly 10k songwriters. We're currently using very simple regex cleaning techniques to accomplish much of the matching between tracks and compositions (removing special characters, etc). However, more suggestions are welcome in determining how to best match tracks and compositions
I have roughly 260k composition records from ASCAP that will need to be matched against the current recordings indexed within the recommender.
In order to do so, I believe the composition records will first need to be cleaned and stored across a couple of PostgreSQL tables. Any subsequent song credits retrieved will need to be cleaned and stored as well.
Afterwards, the fun of actually matching these records begins. My best guess to do this involves matching
song_title
andartist
fields from both tables. I'm assuming there will be a number of records that may need to be manually matched, which hopefully won't amount to more than 5% of the current track set (~1100).