johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

Writer and Producer Credit info cleanup #128

Closed sebastiankogler closed 4 years ago

sebastiankogler commented 4 years ago

Hey there!

I'm working on a project to compile writer and producer credits. When I pull writer and producer credits like so:

[example]

import lyricsgenius
genius = lyricsgenius.Genius
credit = genius.search_song("Circles", "Post Malone")
print(credit.writer_artists)

it returns a very long string that includes HTML, artist page links, media links, and the desired info all jumbled together.

It would be awesome if it only returned the names of credited writers. The same bug happens if I call writer_producers.

I've provided the process I've used above and a screenshot of the process returns below. I'm running macOS with lyricsgenius 1.8.2. Also, I'm a bit of a beginner with python and JSON, so when it comes to alternatives, i've done some rudimentary googling on methods to parse or otherwise breakup the string that's being returned and sift out the credited names. I haven't had any luck yet. I think it may have to do with the way that the elements are structured in the DOM on genius.com [screenshots attached], but I have no clue how to parse the elements / which element to call so that it returns just the names.

Thanks so much! Looking forward to your thoughts.

Screen Shot 2020-01-08 at 09 23 27 Screen Shot 2020-01-08 at 09 42 58 Screen Shot 2020-01-08 at 09 41 59

johnwmillr commented 4 years ago

Hi Sebastian,

credit.writer_artists should be returning a JSON object (represented as a list of Python dictionaries).

Try looking at the dictionary keys to see which fields are available:

writers = credit.writer_artists[0]
print(writers.keys())

This code should get you the name of each writer:

names = [writer['name'] for writer in credit.writer_artists]
print(names)

Does that work for you? I don't have access to the package at the moment, so the syntax may not be exactly correct.

John

sebastiankogler commented 4 years ago

hey John!

That works perfectly. Thank you so much.

johnwmillr commented 4 years ago

Great! Glad to hear it.