Open Daniel-at-github opened 6 years ago
The same in #522 with https://www.pyohio.org/2018/schedule/conference.json
Relevant code on where conference.json
is produced:
https://github.com/pinax/symposion/search?q=conference.json&unscoped_q=conference.json
My approach for the one conference where I used conference.json
as data source was to slugify the first n characters of the titles using python-slugify
(where n was something in the order of 30). This gave me a talk identifier that was quite robust against false positives (mismatching Youtube titles to conference.json titles) while matching almost all talks to their Youtube videos.
An alternative might be to use a string similarity metric on the conference title, see this Stackoverflow question for a few ideas how to quickly create a simple string similarity function.
As @jonemo pointed out in https://github.com/pyvideo/data/issues/494#issuecomment-390267316 there are schedule datasets in
symposium
conference webs. Example:Anyone have a idea on how to join this with the conferences in https://github.com/pyvideo/data dataset?