johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
892 stars 158 forks source link

data = ' '.join(song.lyrics for song in self.songs) fail #157

Closed Matoutou27 closed 3 years ago

Matoutou27 commented 3 years ago

my code: Artistes_names = []

def getsong(Artiste_name): artist = genius.search_artist(Artiste_name, max_songs=2, sort="title") artist.to_text()

for artistes in Artistes_names: getsong(artistes)

this is returning me
data = ' '.join(song.lyrics for song in self.songs) TypeError: sequence item 0: expected str instance, NoneType found

Expected behavior As i understand it should be creating a text file but it is failling to join the lyrics. it gives the same error when i am trying to save with: save_lyrics( extension='txt',)

To Reproduce windows 10 python 3.8.5 latest version of LyricsGenius

Additional context it is working maybe 7% of the time totaly randomly. tried with different artists, songs, etc. no consistence

allerter commented 3 years ago

I tried your code, and everything went fine for me (a total of 30 songs from 3 artists). Are you sure you're using version 2.0.0? Please check using pip show lyricsgenius. If you are indeed using version 2.0.0, please provide a list of artists and songs the package failed at. I think this issue might have to do with unreleased songs.

Matoutou27 commented 3 years ago

thanks for the fast answer,

I am indeed using version 2.0.0

here: "Andy Shauf",Népal","Le Dé","Georgio","Zamdane","Kekra","Vald",

but it doesnt seem to be affected by the artist. It works sporadically making it realy frustrating

allerter commented 3 years ago

Aside from the issue, I found for unreleased songs. Your problem might have to do with not finding the artists (in some cases). Always check the artist to see if it's not None:

Artistes_names = ["Andy Shauf", "Népal" "Le Dé", "Georgio", "Zamdane", "Kekra", "Vald"]
def getsong(Artiste_name):
    artist = genius.search_artist(Artiste_name, max_songs=2, sort="title")
    if artist:
        artist.to_text()

for artistes in Artistes_names:
    getsong(artistes)

Also, make sure to set genius.verbose = True to see what's happened when the operation fails. For example, if you get No results found for "Some Artist"), that means the artist wasn't found, and therefore a NoneType object is returned. This can happen when searching for songs too, so use if song: when searching for songs as well. If you have no problem with finding the artist and the message you get is Couldn't find the lyrics section., then it has to do with not being able to find the lyrics. In this case, it would help me a lot if you could provide some data since I'm not getting any errors myself. Follow the instructions of this comment to do so (the methods mentioned in there are in the api.py file of the package. The location of the package can be found using pip show lyricsgenius)

Matoutou27 commented 3 years ago

I am not getting any "No results found for "Some Artist"" or "Couldn't find the lyrics section" in my console.

I read de instruction you gave in the issue #148 and added the print statement at the right place but i don't realy understand what do you mean by: "send the code of your _scrape_song_lyrics_from_url() function and the output of that print statement here" (sorry if it's obvious)

Thanks again for the help. I really appreciate it.

Here is the raw output in my console that i am getting with the exact code you gave me in the last comment and after adding the print statement in _scrape_song_lyrics_from_url().

`PS D:\École\projet Agatha\lyrics> & C:/Users/mathi/AppData/Local/Microsoft/WindowsApps/python.exe "d:/École/projet Agatha/lyrics/investigation.py" Searching for songs by Andy Shauf...

Song 1: "Alexander All Alone" Song 2: "All the Same" Song 3: "Angela" Song 4: "Beautiful" Song 5: "Begin Again"

Reached user-specified song limit (5). Done. Found 5 songs. Traceback (most recent call last): File "d:/École/projet Agatha/lyrics/investigation.py", line 17, in getsong(artistes) File "d:/École/projet Agatha/lyrics/investigation.py", line 14, in getsong artist.to_text() File "d:\École\projet Agatha\lyrics\lyricsgenius\artist.py", line 121, in to_text data = ' '.join(song.lyrics for song in self.songs) TypeError: sequence item 0: expected str instance, NoneType found`

allerter commented 3 years ago

I'm happy to help you with this. And you're right; that sentence is a bit ambiguous. Just replace your _scrape_song_lyrics_from_url() method with the one below:

    def _scrape_song_lyrics_from_url(self, url):
        """Uses BeautifulSoup to scrape song info off of a Genius song URL

        Args:
            url (:obj:`str`, optional): URL for the web page to scrape lyrics from.

        Returns:
            :obj:`str` \\|‌ :obj:`None`: If it can find the lyrics, otherwise `None`

        Note:
            This method removes the song headers based on the value of the
            :attr:`remove_section_headers` attribute.

        """
        page = requests.get(url)
        if page.status_code == 404:
            if self.verbose:
                print("Song URL returned 404.")
            return None

        # Scrape the song lyrics from the HTML
        html = BeautifulSoup(page.text, "html.parser")
        url = "".join(c for c in url if c.isalnum()
                      or c in (" ", ".", "_")).rstrip()
        filename = 'song data - {}.txt'.format(url)
        with open(filename, 'w', encoding='utf8') as f:
            f.write(str(html))
        # Determine the class of the div
        old_div = html.find("div", class_="lyrics")
        new_div = html.find("div", class_=re.compile("Lyrics__Root"))
        unreleased = ("Lyrics for this song have yet to be released."
                      " Please check back once the song has been released.")
        if old_div:
            lyrics = old_div.get_text()
        elif new_div:
            lyrics = new_div.get_text('\n').replace('\n[', '\n\n[')
        elif unreleased in html.find_all(text=True):
            if self.verbose:
                print(unreleased)
            lyrics = unreleased
        else:
            if self.verbose:
                print("Couldn't find the lyrics section.")
            return None

        if self.remove_section_headers:  # Remove [Verse], [Bridge], etc.
            lyrics = re.sub(r'(\[.*?\])*', '', lyrics)
            lyrics = re.sub('\n{2}', '\n', lyrics)  # Gaps between verses
        return lyrics.strip("\n")

This will save the song's HTML page in a text file with a name like song data - ... any time it searches for a song. So search for an artist (like the one in the raw output), and send the code of the saved files here when searching for an artist fails. You can send the files here using the "Attach files by dragging..." below.

Matoutou27 commented 3 years ago

I have found the probleme when trying the code you gave me! My consol was telling me that i had the right version of LyricsGenius but my code was using, for some reasons, an older one in an other directory.

Sorry for inconvenience:(

Thanks for your work, you guys rock!

allerter commented 3 years ago

That's okay. I'm glad that it's solved. I'm closing this issue since it's solved now. Feel free to open another one if you face any other problems.