pyvideo / data

Python related videos and metadata powering PyVideo.
https://pyvideo.org
Creative Commons Zero v1.0 Universal
452 stars 265 forks source link

Usage of dataset schedule/conference.json #526

Open Daniel-at-github opened 6 years ago

Daniel-at-github commented 6 years ago

As @jonemo pointed out in https://github.com/pyvideo/data/issues/494#issuecomment-390267316 there are schedule datasets in symposium conference webs. Example:

Anyone have a idea on how to join this with the conferences in https://github.com/pyvideo/data dataset?

Daniel-at-github commented 6 years ago

The same in #522 with https://www.pyohio.org/2018/schedule/conference.json

Daniel-at-github commented 6 years ago

Relevant code on where conference.json is produced: https://github.com/pinax/symposion/search?q=conference.json&unscoped_q=conference.json

jonemo commented 6 years ago

My approach for the one conference where I used conference.json as data source was to slugify the first n characters of the titles using python-slugify (where n was something in the order of 30). This gave me a talk identifier that was quite robust against false positives (mismatching Youtube titles to conference.json titles) while matching almost all talks to their Youtube videos.

An alternative might be to use a string similarity metric on the conference title, see this Stackoverflow question for a few ideas how to quickly create a simple string similarity function.