tvgrabbers / tvgrabnlpy

Deze versie is deprecated zie: tvgrabpyAPI
https://github.com/tvgrabbers/tvgrabpyAPI
GNU General Public License v2.0
27 stars 8 forks source link

Crash; possible deadlock #26

Closed JanCeuleers closed 9 years ago

JanCeuleers commented 9 years ago

I've only begun using this fork/continuation of the grabber a few weeks ago. I've successfully used version 2.1.3 for a couple of weeks. As of a few days ago no data was being grabbed and I've had to kill the grabber as it was using lots of CPU. Now the grabber isn't doing much of anything at all: it produces no output and also uses hardly any CPU.

According to strace the grabber seems to be indefinitely waiting for a lock to be released. This remains the case even after a reboot.

JanCeuleers commented 9 years ago

strace output available here:

https://www.dropbox.com/s/vddd8sufbotxq8b/tv_grab_nl_py.strace.out?dl=0

Command used to create that file:

( strace -f tv_grab_nl_py.github/tv_grab_nl.py --config-file tv_grab_nl_py.conf --output tv_grab_nl_py.out --cache tv_grab_nl_py/cache --compat ) &> tv_grab_nl_py.strace.out

The version of tv_grab_nl_py used to create this trace was 2.1.5

JanCeuleers commented 9 years ago

Deleting the cache causes the grabber to spring back to life. I submit that the grabber should recover from this situation autonomously, or at least warn the operator about the problem.

winfried commented 9 years ago

I experience similar problems, but haven't had time yet to properly analyze them.

hikavdh commented 9 years ago

Can you give me your config and logfiles? You can put them here https://drive.google.com/open?id=0B1laGJ_yJDl-OFQwMFhXcjVHbjg&authuser=0 Your output doesn't tell me much, but it seems to be missing some libraries and other files. "readlink("/usr/bin/python2", "python2.7", 4096) = 9 readlink("/usr/bin/python2.7", 0x7ffeea41fd90, 4096) = -1 EINVAL (Invalid argument) stat("/usr/bin/lib/python2.7/os.py", 0x7ffeea41ec40) = -1 ENOENT (No such file or directory)" Do you have python 2.7 properly installed? What is your distribution?

JanCeuleers commented 9 years ago

Config file is here: https://www.dropbox.com/s/a6av50qh481gns2/tv_grab_nl_py.conf?dl=0

There's no logfile.

As I said my problem was resolved when I deleted the cache. So what seems to have happened is that when I previously killed the grabber it left the cache file in a corrupted state which subsequently caused a deadlock.

My distribution is Ubuntu 12.04:

root@zotac:~# dpkg -l | grep python2 ii libpython2.6 2.6.7-4ubuntu1.1 Shared Python runtime library (version 2.6) ii libpython2.7 2.7.3-0ubuntu3.6 Shared Python runtime library (version 2.7) ii python2.6 2.6.7-4ubuntu1.1 An interactive high-level object-oriented language (version 2.6) ii python2.6-minimal 2.6.7-4ubuntu1.1 A minimal subset of the Python language (version 2.6) ii python2.7 2.7.3-0ubuntu3.6 Interactive high-level object-oriented language (version 2.7) ii python2.7-minimal 2.7.3-0ubuntu3.6 Minimal subset of the Python language (version 2.7)

hikavdh commented 9 years ago

If no log file, even an empty one, is created, it doesn't come past the first python initialization. Do you get any output to the screen? Can you run other python scripts? What happens if you start python2 in interactive mode? To me it looks like that at present your python2 installation is corrupted. This doesn't say anything about your initial problem.

hikavdh commented 9 years ago

One other thing. I see you have both 2.6 and 2.7 installed. With 2.7 present you don't need 2.6 and they might not interact nice. 2 and 3 are well separated.

winfried commented 9 years ago

Not 100% sure I experience exactly the same problem, but it looks quite a lot like it. Today it ended up hanging again with 150% CPU load in top. The run started at about 5:30 this morning, now 13 hours later it is still running. The last (non-cache) run took only about 7 hours. The output file is 0 bytes big and last written to when the run started. The cache and the log files are written to much later, but still idle for many hours: -rw-r--r-- 1 mythtv mythtv 11364750 Apr 14 12:17 program_cache -rw-r--r-- 1 mythtv mythtv 247125 Apr 14 07:23 tv_grab_nl_py.log

Log and conf can be found here: https://gist.github.com/winfried/5b316156d49e42f9dfeb

hikavdh commented 9 years ago

What I see in the log looks at first glance OK. I've been running that beta version now for some 10 days with a minor update, that doesn't contain code. That the log gets actually written later is due to disk cashing. It's possible there is a final part still in the disk cash. If you kill it (pkill python2, but be careful that you don't have other python2 code running ) that last part will get written to disk and possibly that contains the error. If so it should also have been written to the screen. Looking at what is left in the queue, it should have taken maybe 5 or 10 min after the last log entry, so the cache geting written to 5 hours later is weird. After a pkill your cash-file could get damaged! The script uses a thread for every channel and every source and a final thread for managing the cache. If one of them crashes it should signal the other threads to close up, including writing everything left to cash. There are several flags managing this. So even pressing Ctrl-C can take some time before you see a reaction. I'm going to see what happens here when I run it with your configurations. Off cause with an empty cash. So it will take some time with 48 channels

hikavdh commented 9 years ago

Got an error on which it stays hung. "'unicode' object has no attribute 'group'" both to the screen and to the log. and later again while creating the xml output. Probably some illegal data in the web input I have to filter out. I'm comming back to it.

winfried commented 9 years ago

Thanks a lot for digging into it!

hikavdh commented 9 years ago

De Nada. It happens also with version 2.1.4. It definitely seems connected to some change on the detail pages on the tvgids.nl webside. The weird thing is that I until now haven't seen the error. I checked all logs from last week.

hikavdh commented 9 years ago

As my production system still doesn't give the error it seems the data creating the error is on pages I already have in cache and that those pages where changed from the original after I fetched them.

hikavdh commented 9 years ago

Haven't found the exact cause. But have put an exception catcher on the location. It now gives an error "Error processing the detailpage: " and continues. I'm putting it up for download now

hikavdh commented 9 years ago

OK, I think what is causing the problem. The page http://www.tvgids.nl/programma/18150767/ redirects to another page: http://www.tvgids.nl/Aandacht_voor_hersenonderzoek/artikel/78583/. So I have to catch the redirect.

hikavdh commented 9 years ago

It's not the redirect. It's a combination of a missing end tag </p> on the page, which was caught. But an error in my debugging code, which shouldn't have run, but for a forgotten statement, that should only let it run on my system, creating a new error while handling the error. It's now fixed.

winfried commented 9 years ago

Yesterday I did a run with the new v2.1.5-beta and a clean cache. That went well. Today I let it run with the cache file it created yesterday. I am sorry to say, but it hung in exact the same way as before. So I am afraid this is not fixed yet.

hikavdh commented 9 years ago

Weird. Dit you get a log?

hikavdh commented 9 years ago

Preferable both the old log from the successful run and the present failed one.

winfried commented 9 years ago

On 16-04-15 20:43, Hika van den Hoven wrote:

Hi,

Preferable both the old log from the successful run and the present failed one.

I Attached both logs.

Winfried

winfried commented 9 years ago

Attaching failed, I have put them both in a gist: https://gist.github.com/winfried/3cae6513e7f21b0233b1

hikavdh commented 9 years ago

I've tried to recreate by running with your config. Only with a different output and loglevel 7 to not get to much clutter. I ran last night which took hours but finished OK. I only see one failure which is handled OK. Just now I ran again and again OK. Just that one failure. So I don't know?
I added some extra prevention code. I'll put that up later today.

hikavdh commented 9 years ago

As promised: https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.5-p20150417 The previous found bug was introduced in the final test fase to version 2.1.0. I adapted the code to catch the detected anomalies on the site, but forgot to adapt the testcode, afterwards. All testcode is now explicitly isolated to only run with my debug flag set. Also any failure on a page grab (mostly due to invalid html tagging, like this description on tvgids.tv
<p>Wielrennen: liveverslag van Luik-Bastenaken-Luik. Commentaar: Michel Wuyts en José De Cauwer. Interviews: Renaat Schotte. - Motorcross: GP van Europa in Valkenswaard. Commentaar: Gert Gommé. - <a href="/op-tv/paarden<a href="/op-tv/sport" title="Sport op tv" class="inconspicuous">sport</a>" title="Paarden<a href="/op-tv/sport" title="Sport op tv" class="inconspicuous">sport</a> op tv" class="inconspicuous">Paarden<a href="/op-tv/sport" title="Sport op tv" class="inconspicuous">sport</a></a>: Jumping van Antwerpen.</p>
resulting in the browser to
Wielrennen: liveverslag van Luik-Bastenaken-Luik. Commentaar: Michel Wuyts en José De Cauwer. Interviews: Renaat Schotte. - Motorcross: GP van Europa in Valkenswaard. Commentaar: Gert Gommé. - sport" title="Paardensport op tv" class="inconspicuous">Paardensport: Jumping van Antwerpen or forgotten endtags.)
is now generically caught and dealt with, with a generic failure. This kind of errors cannot be corrected. This on top of specific detection, that can and will get corrected.
On the base pages, this kind of generic failure will set that source as finished letting the other sources continue and finish without that sources data.

JanCeuleers commented 9 years ago

I've had a new crash occurring for the first time yesterday, and which I have this time not been able to sidestep by deleting the cache. The error occurs fairly shortly after the grabber starts, and then it just sits there eating 100% of a CPU core. The error is:

2015-04-30 05:00:46 : An unexpected error has occured at line: 7192, 283: u'59' 2015-04-30 05:00:46 : tracing back to line: 7074, 6 2015-04-30 05:00:46 : tracing back to line: 7062, 90

Logfile is here: https://www.dropbox.com/s/06y5wjnhpu00o1a/tv_grab_nl_py.log Config file is still here: https://www.dropbox.com/s/a6av50qh481gns2/tv_grab_nl_py.conf

If this is a different issue I will be happy to open another ticket.

Note that following your earlier remark I have uninstalled Python 2.6 and so only have 2.7 installed. This machine runs Ubuntu 12.04. I vaguely remember there being an xmltv update recently (from the mythbuntu repository):

root@zotac:~# dpkg -l | grep xmltv ii libxmltv-perl 0.5.65-1+r4884p181~ubuntu12.04.1 Perl libraries related to the XMLTV file format for TV listings ii xmltv 0.5.65-1+r4884p181~ubuntu12.04.1 Functionality related to the XMLTV file format for TV listings ii xmltv-gui 0.5.65-1+r4884p181~ubuntu12.04.1 Graphical user interface related to the XMLTV file format ii xmltv-util 0.5.65-1+r4884p181~ubuntu12.04.1 Utilities related to the XMLTV file format for TV listings

I am running your latest beta (2.1.5).

hikavdh commented 9 years ago

Checking your log I see you are running the version from april 12. This doesn't have the fix for this problem. The latest beta is from april 17 containing and contains the fix. I probably will set it stable this weekend. (the p-string at the end of the version gives the date. In beta versions the date can be changed without changing the version)

Hika

winfried commented 9 years ago

Well, I still seeing the same problem with the latest beta, but I haven't had time to dig deeper into it. The pattern is that without a cache it usually runs ok, but when there is a cache it usually fails.

JanCeuleers commented 9 years ago

I hadn't spotted that. This version does indeed work. False alarm. Interesting though that, although I opened this ticket to begin with, I seem not to have hit the problem that you fixed in this most recent version prior to the day before yesterday. So the problem that prompted me to open this ticket must have been a different-one. I now can no longer reproduce that (ever since I deleted the cache), so don't keep this ticket open on my account. Thanks.

BertusG commented 9 years ago

Unfortunately my setup is also broken. I updated to the latest version v2.1.5-beta, deleted the cache and output file en generated a new config file. I get the following error:

2015-05-01 02:33:04 : An unexpected error has occured at line: 5913, 175: coercing to Unicode: need string or buffer, _sre.SRE_Match found 2015-05-01 02:33:04 : tracing back to line: 5805, 946 2015-05-01 02:33:04 : tracing back to line: 4261, 704

Config file: https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/tv_grab_nl_py.conf Log: https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/tv_grab_nl_py.log Script: https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/tv_grab_nl.py

hikavdh commented 9 years ago

Look at the top of your log: "2015-05-01 02:30:02 : The Netherlands (tv_grab_nl_py version 2.0.4-p20150302)" It looks like you were using version 2.0.4 that has this bug!

hikavdh commented 9 years ago

I just now notice two other problems. teveblad.be structurally returning wrong date andstructural failures on the tvgids.nl detail pages. I'll look into that.

BertusG commented 9 years ago

Strange I overwrite the tv_grab_nl.py with the new version. Do I have to create a new symlink in /usr/bin also?

hikavdh commented 9 years ago

I don't know, but you can test on the commandline with --version or -V . There are new issues, but with this version it shouldn't hang. https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.5-p20150417

hikavdh commented 9 years ago

The problem with tvgids.nl is a restyling. I'm rewriting the code. Somewhere this weekend a 2.1.6-beta will stand ready. The teveblad.tv problem, if not temporarily, is more complex, so will take a little longer.

hikavdh commented 9 years ago

A first raw version https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.6-p20150502 is available. It only addresses the tvgids.nl issue and catches now 95%

BertusG commented 9 years ago

Great, your work is highly appreciated! I'll give the new beta a try.

BertusG commented 9 years ago

Did a full run with the 2.1.6, it runs ok but a lot of errors with tvgids.nl. I think the restyling issue. I uploaded my files for further reference if you like. https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/2.1.6/tv_grab_file.xmltv https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/2.1.6/tv_grab_nl_py.log https://dl.dropboxusercontent.com/u/25570518/tv_grab_nl/2.1.6/tv_grab_nl_py.conf

hikavdh commented 9 years ago

Thanks. I'm running the final test and it looks good. In one or two hours you can expect an updated version with fully working tvgids.nl detail grab. A teveblad.be fix will follow later.

Hika

hikavdh commented 9 years ago

A fix for the restyled tvgids.nl detail fetch is available at https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.6-p20150502-2 It is advised to keep tevebladdays set to 0 until an update is available. Configs made on or after april 28 do not contain teveblad id's

winfried commented 9 years ago

On 05/02/2015 10:51 PM, Hika van den Hoven wrote:

Hi,

A fix for the restyled tvgids.nl detail fetch is available at https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.6-p20150502-2 It is advised to keep tevebladdays set to 0 until an update is available. Configs made on or after april 28 do not contain teveblad id's

Thanks a lot I will test it the next days.

Winfried

hikavdh commented 9 years ago

A further update for the teveblad.be problems is now available at https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.6-p20150503. You will need to add the included file: teveblad_channels.html to your homedirectory. It will then get copied to the .xmltv directory. Any configuration created after this problem occurred (april 28/29), doesn't contain any info about the teveblad.be pages. You will have to delete them, recreating them with the --configure switch. The above mentioned file is thereby essential.

hikavdh commented 9 years ago

A small update: https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.6-p20150504

JanCeuleers commented 9 years ago

Would it be possible for the grabber to wget the teveblad_channels.html file from somewhere (github?) when needed rather than the user having to install it in what is rather a strange location (namely the home directory of the user that runs the grabber)?

hikavdh commented 9 years ago

Of cause that would be possible. I only hope this is temporarily and this was the fastest solution. It will be copied to the .xmltv directory and used from there, but as that might not jet exist!... If it's in .xmltv you anyway can delete it from your homedirectory. It looks like the site has changed ownership and if so, things will change soon again. Also the pages I used are still there but only empty. So everything points to imminent changes. Also it coinciding with the makeover on tvgids.nl could suggest they at least now are owned by the same company. I now first will work on fall back solutions, like trying both pages. I also see now and then some of the pages returning empty. For tvgids.nl I want to create a json grab as fall back. It contains slightly less info, but of its nature will be more stable.

hikavdh commented 9 years ago

Oh, and I found a great quite compete source of icons I want to implement. http://212.142.41.211/ChannelLogos/02/ and smaller at http://212.142.41.211/ChannelLogos/01/ I guess from upc/ziggo

Hika

JanCeuleers commented 9 years ago

Been using beta-2.1.6-p20150504 for a few days now; working fine. Thank you very much.