johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

Download Lyrics by genre #149

Closed D-Singh-S closed 3 years ago

D-Singh-S commented 4 years ago

Hi!

Hope you can help. What would be the best way to scrape lyrics by genre using the LyricsGenius wrapper? For instance, if I wanted to extract Pop lyrics, could I use the URL https://genius.com/tags/pop/all ?

Sorry, I'm just starting out coding. Would appreciate any help!

allerter commented 4 years ago

To do this you need to make requests to https://genius.com/tags/pop/all?page=1. Making that request returns an HTML page that contains an unordered list (ul tag) which has songs (elements with li tag). In those elements, you have access to the name of the song and its URL which you can use to get the lyrics and also get song info. The tag pages use a pagination process like many other parts of Genius, and using that you can traverse the results till there are no more left. At the end of the results for each request there is a div tag with a class attribute named pagination:

<div class="pagination"><a href="/tags/pop/all?page=1" class="prev_page" rel="prev start">&laquo; Previous</a> <a href="/tags/pop/all?page=1" rel="prev start">1</a> <span class="current">2</span> <a href="/tags/pop/all?page=3" rel="next">3</a> <a href="/tags/pop/all?page=4">4</a> <a href="/tags/pop/all?page=5">5</a> <a href="/tags/pop/all?page=6">6</a> <a href="/tags/pop/all?page=7">7</a> <a href="/tags/pop/all?page=8">8</a> <a href="/tags/pop/all?page=9">9</a> <span class="gap">&hellip;</span> <a href="/tags/pop/all?page=57768">57768</a> <a href="/tags/pop/all?page=57769">57769</a> <a href="/tags/pop/all?page=3" class="next_page" rel="next">Next &raquo;</a></div>

Using the tag above you can get the number of the next page, and send requests which all will follow this template:

https://genius.com/tags/pop/all?page=PAGE_NUMBER

To do the things I mentioned above, you'll need a web-scraping library. Beautiful Soup is one of those libraries and LyricsGenius uses that as well to scrape the lyrics from the song page.