tripod31 / foo_lyricsource

lyricsource component for foo_uie_lyrics3
49 stars 5 forks source link

Extra text after lyrics displayed #2

Closed devmem closed 8 years ago

devmem commented 8 years ago

Great plugin - thanks - it's very quick too :)

Unfortunately, I seem to get extra stuff at the end of the lyrics - - it looks like the end isn't stripped correctly. This is what I get after Motorhead, Eat The Rich for example

 Submit CorrectionsVisit www.azlyrics.com for these lyrics.  A-Z Lyrics  M  MOTORHEAD Lyrics  "Rock 'n' Roll" (1987)Rock 'n' Roll
Eat The Rich
Blackheart
Stone Deaf In The USA
The Wolf
Traitor
Dogs
All For You
Boogeyman
                                                       Search                                                                    Request Lyrics            Submit Lyrics            Soundtracks            Music Videos            Facebook            Links                                                                                                          Advertise Here            Privacy Policy            DMCA Policy            Contact Us                                                   Powered by                        MOTORHEAD lyrics are property and copyright of their owners. "Eat The Rich" lyrics provided for educational purposes and personal use only.
                             curdate=new Date();                document.write("<strong>Copyright &copy; 2000-"+curdate.getFullYear()+" AZLyrics.com<\/strong>");                                      cf_page_artist = ArtistName;cf_page_song = SongName;cf_page_genre = "rock";right &copy; 2000-"+curdate.getFullYear()+" AZLyrics.com<\/strong>");               var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-4309237-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();            window.jQuery || document.write('<script src="http://images.azlyrics.com/local/jquery.min.js"><\/script>')])          $(function () {       if ($('#CssFailCheck').is(':visible') === true) {         $('<link rel="stylesheet" type="text/css" href="http://www.azlyrics.com/bs/css/bootstrap.min.css"><link rel="stylesheet" href="http://www.azlyrics.com/bsaz.css">').appendTo('head');       }      });    ://www') + '.google-anal
tripod31 commented 8 years ago

Hi. Could you try this branch? https://github.com/tripod31/foo_lyricsource/tree/extcmd Currently,I can't access to the site,so I can't comfirm "www.azlyrics.com" source works. Also,please try "External Command" source if you can.Though It needs to install python3,it's pretty difficult to make it work.

devmem commented 8 years ago

That's much better... I now get left with all the lyrics, a couple of blank lines and "Submit Corrections" as if it is the last line of lyrics. Everything else has gone.

devmem commented 8 years ago

Looking at the last commit, I can see you've changed what you're looking for when searching for start/end of scraping. looking for Usage in a comment to start scraping, and googleoff in a comment to stop scraping. Looking at the source of a couple of azlyrics pages, the following appears after the last line of the lyric but before the googleoff

</div>

<br><br>

<form id="addsong" style="visible:hidden; margin:0;" action="../../add.php" method="post">
<input type="hidden" name="what" value="add_song">
<input type="hidden" name="artist" value="AC/DC">
</form>

<form action="../../add.php" method="post" id="corlyr">
<input type="hidden" name="what" value="correct_lyrics">
<input type="hidden" name="song_id" value="244605">
</form>

<div class="smt noprint">
<a class="btn btn-share" href="#" onclick="document.getElementById('corlyr').submit();return false;"><span class="glyphicon glyphicon-pencil"></span> Submit Corrections</a>
</div>

<!--googleoff: index-->

The </div> always appears to be the first line after the lyrics. It also appears to be the first </div> after the Usage comment....

tripod31 commented 8 years ago

I commited the fix,according to your suggestion.

devmem commented 8 years ago

Fix seems to work. Not sure what your code does exactly - does it only look for div inside <> or does it match any string found on a line with div in it (for example division). If it is the first of these, do you want me to close this or wait until it is pulled onto the master branch.

tripod31 commented 8 years ago

It reckons < /div > as end of lyric.I merged the branch to master.Now I can access to azlyrics via proxy,so I can confirm it works. Thanks for reporting the issue and help.

devmem commented 8 years ago

Brilliant - thanks :)