johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

search_artist returns wrong artist #147

Closed kaiffeetasse closed 3 years ago

kaiffeetasse commented 4 years ago

Describe the bug When searching for "Céline" the search_artist-Method returns "Celine Dion" although the search on genius.com returns the correct artist under "top result" (see: https://genius.com/search?q=C%C3%A9line)

Expected behavior return https://genius.com/artists/Celine instead of https://genius.com/artists/Celine-dion

To Reproduce artist = genius.search_artist("Céline")

Searching for songs by Céline... Changing artist name to 'Céline Dion'

Version info all versions

allerter commented 4 years ago

Read the paragraphs under Solution for the solution. When LyricsGenius performs a search to find the artist, it searches Genius web which returns a dictionary that contains types like _tophit and artist. Searching Genius web as the name suggests, is just like when an end-user performs a search on Genius. Usually, the top result is the correct one, but when LyricsGenius tries to match the artist name you provided to the artists in the results, it directly compares the two. If there is a match, the result will be returned, and if not, the first result is returned (which is usually the correct result, but not always). In your case, your search term is "Céline" but the artist's name is "CÉLINE", and because of that the matching fails and the first result ("Céline Dion") is returned. But why is the first result Céline Dion when the top hit in the browser is CÉLINE? That's because when LyricsGenius gets the results, it sorts them using the sorted function, but sets reversed to True. Therefore now the first result is Céline Dion and since matching has failed, the first result is returned.

Solution

You could either change your search term to "CÉLINE", or edit the code. If you want to edit the code, change the following lines in api.py. from:

        sections = sorted(response['sections'],
                          key=lambda sect: sect['type'] == type_,
                          reverse=True)

        hits =[hit for section in sections for hit in section['hits'] if hit['type'] == type_]
        for hit in hits:
            if hit['result'][result_type] == search_term:

to:

        sections = sorted(response['sections'],
                          key=lambda sect: sect['type'] == type_,
                          reverse=False)

        hits =[hit for section in sections for hit in section['hits'] if hit['type'] == type_]
        for hit in hits:
            if self._clean_str(hit['result'][result_type]) == self._clean_str(search_term):