johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

Perform non-song filtering before choosing possible result #108

Closed airdrummingfool closed 3 years ago

airdrummingfool commented 5 years ago

First of all, thanks for this awesome project! I'm using it via the Home Assistant component genius-lyrics.

Is your feature request related to a problem? Please describe. When using genius.search_song(), sometimes the desired song info is a few hits down the list in the search results, while the first couple hits are all "New Music Friday"-type meta lists. Anecdotally, this happens a lot when the song features a secondary artist (i.e. the artist is Main Artist (feat. Artist 2)).

I can exclude "New Music Friday" using excluded_terms, but this filter isn't applied until after the one potential result has been pulled from the list. So, while I don't get back a bad result (the excluded_terms filters it correctly), I get an empty result, instead of LyricsGenius continuing the search through the hits for the first non-rejected result.

Describe the solution you'd like It would be awesome if _get_item_from_search_response checked each hit against the rejection list before returning, so that if the first hit is rejected, the second one will be considered, and so on. This would likely only apply when type_ == "song" && self.skip_non_songs is True.

Alternatively, search_song could simply use a different method to unpack the response, which could do the filtering in-place (basically, a copy of the _get_item_from_search_response specifically for songs/lyrics).

Describe alternatives you've considered I've noticed that some (many?) of these meta-pages are tagged with meta or Non-Music. If there was a way to exclude by tags, results titles might not need to be filtered. Unfortunately, I don't see

Additional context Example:

genius.search_song('Echo (feat. Tauren Wells)', 'Elevation Worship, Tauren Wells')

Results (snipped for legibility). Note that the third result is the desired one:

{
    "sections": [{
        "hits": [{
            "result": {
                "_type": "song",
                "url": "https://genius.com/Spotify-new-music-friday-08-31-18-annotated",
                "full_title": "New Music Friday 08/31/18 by\xa0Spotify",
                "title": "New Music Friday 08/31/18",
            },
            "type": "song"
        }, {
            "result": {
                "_type": "song",
                "url": "https://genius.com/Spotify-new-music-friday-christian-07-26-19-annotated",
                "full_title": "New Music Friday Christian 07/26/19 by\xa0Spotify",
                "title": "New Music Friday Christian 07/26/19",
            },
            "type": "song"
        }, {
            "result": {
                "_type": "song",
                "url": "https://genius.com/Elevation-worship-echo-lyrics",
                "full_title": "Echo by\xa0Elevation\xa0Worship (Ft.\xa0Tauren\xa0Wells)",
                "title": "Echo",
            },
            "type": "song"
        }],
        "type": "top_hit"
    }]
}

I'd be happy to work up a proof-of-concept PR if you are okay with this idea.

johnwmillr commented 5 years ago

I like this idea! Sorry for the delayed response. I've been frustrated with different issues like this all related to handling situations where the top result isn't the user's target. I'd be happy for a PR from you if you're willing to work up a proof of concept.

John

robert-alfaro commented 4 years ago

Firstly, thanks @johnwmillr for an awesome library, I've enjoyed making an HA component with it! Secondly, nice to see you here @airdrummingfool! I recently was updating the genius-lyrics HA component and have been stumbling across various song results that are definitely not music lyrics. I've browsed the code and realized the same thing..all the bad results have Tags associated such as Non-Music (as primary tag). It looks like Genius API does not provide this meta data in the response, right?

@airdrummingfool regarding your issue with "New Music Fridays", is that because you use Spotify? I am not a Spotify user (I use Google Music in various ways: cast, gpmdp, Google Home); If those are "ads" then they could be filtered out in the HA component, meaning don't even search for lyrics for those items. I imagine something similar with Pandora. In other words, depending the media player type, some non-songs should not be considered.

airdrummingfool commented 4 years ago

@johnwmillr sorry for not getting you a PR yet - things have gotten busy for me, though I do one day hope to work on this more.

Hi @robert-alfaro! Short answer: The "New Music Fridays" response may be related to my use of Spotify, but it is not at all related to ads. Longer answer: Certain valid song info, when searched on Genius, comes up with the first result as an entry on a "New Music Friday" playlist. This is on the Genius side, and isn't affected by what platform you're listening on (also I have Spotify Premium, so there are zero ads); though it does depend on exactly how the artist and title are listed. You can see in my example in the original post that I'm searching for a real song by a real artist, which happens to be listed on a Spotify playlist.

That being said, It is possible that Spotify lists the artist/song title slightly differently than Genius (note that the song in question "features" another artist, which can be written out multiple ways), and since it's matching a Spotify playlist, this might be more likely to occur when listening to Spotify (since the search parameters from Spotify and the playlist entry in Genius would match exactly). For example, my example song's title in Spotify includes (feat. Tauren Wells), while the Genius entry does not have this. Here's a screenshot of the search results for Echo (feat. Tauren Wells) Elevation Worship, Tauren Wells: image

robert-alfaro commented 4 years ago

Ah gotcha, that make sense. Regarding featured artists..I experienced the same with some tracks having "(Explicit)" in their names. As a first stab, my updates to the HA component filters out exclusion terms like (feat and (explicit from the media title to hopefully improve chances of querying the correct song.