snejus / beetcamp

Bandcamp autotagger source for beets (http://beets.io)
GNU General Public License v2.0
64 stars 11 forks source link

Suggestion: parse JSONLD data instead of scraping data from HTML tags #2

Closed gryphonmyers closed 3 years ago

gryphonmyers commented 3 years ago

In the head of the page, there is a JSONLD tag that contains (I think) all the metadata we need. If we use this data instead of scraping it from HTML tags, it would likely be more resilient to changes in the Bandcamp markup.

snejus commented 3 years ago

Sorry - didn't see this issue until now.

Thanks for the suggestion! This is exactly what I'm doing in the wip make-it-simple branch. Feel free to have a look - bs4 dependency has been dropped overall since I'm depending on a few regex searches and mostly on the metadata from the tag you mentioned.

Just adding some more tests and ensuring that that all possible metadata is returned - it's soon to be complete.

snejus commented 3 years ago

Thanks for your suggestion again - you can now find it in master and the package should be available on pypi: pip install beetcamp.