tvgrabbers / tvgrabnlpy

Deze versie is deprecated zie: tvgrabpyAPI
https://github.com/tvgrabbers/tvgrabpyAPI
GNU General Public License v2.0
27 stars 8 forks source link

Timeout or other cause for process to hang #34

Closed TD-er closed 9 years ago

TD-er commented 9 years ago

For the last few weeks, every other day the script hangs and has to be killed and restarted to finish the collection of channel data. I am only collecting for about 25 channels, so the daily update should only take about an hour.

Often only one restart of the script on the same day is enough to finish, but today it is already taking up to 3 restarts and is still (actively) busy. So perhaps it is waiting for a very long (indefinite?) time-out, or waiting to find something on the page which isn't there (e.g. error-page).

hikavdh commented 9 years ago

Which version and do you have logs and config? Before 2.1.9 a crash could cause a hang. That should be fixed. 2.1.10 also has some fixes in that respect. It hasn't given me any errors in the last two weeks and will go stable soon. I'm off in 15 min. and wont react till Sunday evening.

TD-er commented 9 years ago

I am running 2.1.9 from the 'stable' zipfile. File-details: 423216 Jun 28 14:30 tv_grab_nl.py

hikavdh commented 9 years ago

Can you try 2.1.10 and if your log doesn't show anything run it in a console so you can see any final problems.

TD-er commented 9 years ago

I have also tried the latest beta available and still experience this issue. Since I cannot attach a tar file, I uploaded it to here: http://xmltv.td-er.nl/Github/xmltv_config_TD-er.tgz

I hope this is enough information to see what is happening. (I cannot)

hikavdh commented 9 years ago

I ran your configuration without problem. It took 3 hours. Run it in a console. Due to I think caching the end of the log doesn't get written on a hang. Without an error message I can't do anything.

TD-er commented 9 years ago

Is this caching the same as this command-line setting:

 -S <number>, --cache-save-interval <number>
                        after how many fetches to save the cache <Default 1000>

If so, then I can set it to 1 for now. I always save the std-out and std-err to a file when running my cronjob, so I should get it in this file I guess.

hikavdh commented 9 years ago

No! That's about the data cache, and if you set that to 1 it will become very very slow, because the data cache (up to 10 MB) will get written on every detail fetch. I'm talking about the way linux handles data writes. It caches data until the file is closed or a full block of 4 or 8 k can be written. And on a hang the file is not closed, so on killing data disappears. It is possible you were to fast in killing. So just run tv_grab_nl_py in a terminal and see if any error is given. If your backend is headless, you can do it through ssh. Oh and you better set the log level in the config from 47 to 7. You then get only errors and statistics.

hikavdh commented 9 years ago

Ok, I made a small adjustment so the log gets flushed on a crash or serious error. https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.10-p20150721

TD-er commented 9 years ago

I ran this version twice, without errors, while this morning, with the previous version, it stalled again (but no useful log). So I will try again to reproduce the hanging with this version. At least it logs the flushing :)

...
2015-07-21 10:40:47 : Detail statistics for RTV Noord (channel 24 of 24)
2015-07-21 10:40:47 :    291 cache hits
2015-07-21 10:40:47 :      0 detail fetches from tvgids.nl
2015-07-21 10:40:47 :      0 detail fetches from tvgids.tv
2015-07-21 10:40:47 :      0 failures
2015-07-21 10:40:47 :    634 without detail info
2015-07-21 10:40:47 :      0 left in the tvgids.nl queue to process
2015-07-21 10:40:47 :      0 left in the tvgids.tv queue to process
2015-07-21 10:40:49 : Please wait!! While I save the Cache!!
TD-er commented 9 years ago

I renamed the cache-file and retried. Now the grab fails again (process is not killed yet) and the 'save the Cache' is not visible yet in the output. See http://xmltv.td-er.nl/Github/xmltv_config_TD-er_20150721.tgz

The last part of the log is:

...
2015-07-21 12:22:30 : Now fetching Veronica(xmltvid=.tvgids.nl34) from tvgids.tv
2015-07-21 12:22:30 :     (channel 22 of 22) for day 13 of 14.
2015-07-21 12:22:38 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:22:38 :     (channel 17 of 22) for day 0 of 14.
2015-07-21 12:22:38 : Now merging Veronica (channel 14 of 24):
2015-07-21 12:22:38 :   553 programs from tvgidstv into 51 programs from tvgids.nl
2015-07-21 12:22:39 : Merg statistics for Veronica (channel 14 of 24) from tvgidstv into tvgids.nl
2015-07-21 12:22:39 :     51 programs in tvgids.nl for range: 20-Jul 22:15:00 - 25-Jul 00:05:00,
2015-07-21 12:22:39 :    450 programs in tvgidstv  for range: 21-Jul 03:30:00 - 03-Aug 16:25:00
2015-07-21 12:22:39 :    406 programs added outside common timerange
2015-07-21 12:22:39 :    103 details  added from group slots
2015-07-21 12:22:39 :     45 programs left in tvgids.nl to match
2015-07-21 12:22:39 :     46 programs left in tvgidstv to match
2015-07-21 12:22:39 :     41 programs matched on time and name
2015-07-21 12:22:40 :      4 programs added unmatched from info
2015-07-21 12:22:41 : Skip channel=RTLZ on tvgids.tv, day=0. No data!
2015-07-21 12:22:42 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:22:42 :     (channel 17 of 22) for day 1 of 14.
2015-07-21 12:22:42 : Now fetching details for 554 programs on Veronica(xmltvid=34.tvgids.nl)
2015-07-21 12:22:42 :     (channel 14 of 24) for 14 days.
2015-07-21 12:22:44 : Skip channel=RTLZ on tvgids.tv, day=1. No data!
2015-07-21 12:22:47 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:22:47 :     (channel 17 of 22) for day 2 of 14.
2015-07-21 12:22:50 : Skip channel=RTLZ on tvgids.tv, day=2. No data!
2015-07-21 12:22:52 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:22:52 :     (channel 17 of 22) for day 3 of 14.
2015-07-21 12:22:58 : Skip channel=RTLZ on tvgids.tv, day=3. No data!
2015-07-21 12:23:02 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:02 :     (channel 17 of 22) for day 4 of 14.
2015-07-21 12:23:07 : Skip channel=RTLZ on tvgids.tv, day=4. No data!
2015-07-21 12:23:08 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:08 :     (channel 17 of 22) for day 5 of 14.
2015-07-21 12:23:10 : Skip channel=RTLZ on tvgids.tv, day=5. No data!
2015-07-21 12:23:10 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:10 :     (channel 17 of 22) for day 6 of 14.
2015-07-21 12:23:12 : Skip channel=RTLZ on tvgids.tv, day=6. No data!
2015-07-21 12:23:12 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:12 :     (channel 17 of 22) for day 7 of 14.
2015-07-21 12:23:15 : Skip channel=RTLZ on tvgids.tv, day=7. No data!
2015-07-21 12:23:16 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:16 :     (channel 17 of 22) for day 8 of 14.
2015-07-21 12:23:18 : Skip channel=RTLZ on tvgids.tv, day=8. No data!
2015-07-21 12:23:21 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:21 :     (channel 17 of 22) for day 9 of 14.
2015-07-21 12:23:27 : Skip channel=RTLZ on tvgids.tv, day=9. No data!
2015-07-21 12:23:30 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:30 :     (channel 17 of 22) for day 10 of 14.
2015-07-21 12:23:33 : Skip channel=RTLZ on tvgids.tv, day=10. No data!
2015-07-21 12:23:35 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:35 :     (channel 17 of 22) for day 11 of 14.
2015-07-21 12:23:41 : Skip channel=RTLZ on tvgids.tv, day=11. No data!
2015-07-21 12:23:43 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:43 :     (channel 17 of 22) for day 12 of 14.
2015-07-21 12:23:46 : Skip channel=RTLZ on tvgids.tv, day=12. No data!
2015-07-21 12:23:49 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:49 :     (channel 17 of 22) for day 13 of 14.
2015-07-21 12:23:53 : Skip channel=RTLZ on tvgids.tv, day=13. No data!
hikavdh commented 9 years ago

If you run with no cache it will take 4 or 5 hours at least to complete! So wait! I saw the RTLZ failure too. Obviously there is no program info on tvgids.tv. Check the site.

hikavdh commented 9 years ago

Obviously they removed it.

TD-er commented 9 years ago

The only thing changed in the last 9 hours is that the program_cache file was saved at 13:59. Nothing else.

hikavdh commented 9 years ago

And what is the last log entry? Above that was 12:23. And what is the size of the program_cache. After my test run it was almost 9,2 Mb.

TD-er commented 9 years ago
-rw-rw-r-- 1 gijs gijs 2189746 Jul 21 13:59 program_cache
-rw-rw-r-- 1 gijs gijs   79521 Jul 21 12:23 tv_grab_nl_py.log

and the last part of the log:

...
2015-07-21 12:23:41 : Skip channel=RTLZ on tvgids.tv, day=11. No data!
2015-07-21 12:23:43 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:43 :     (channel 17 of 22) for day 12 of 14.
2015-07-21 12:23:46 : Skip channel=RTLZ on tvgids.tv, day=12. No data!
2015-07-21 12:23:49 : Now fetching RTLZ(xmltvid=.tvgids.nlrtlz) from tvgids.tv
2015-07-21 12:23:49 :     (channel 17 of 22) for day 13 of 14.
2015-07-21 12:23:53 : Skip channel=RTLZ on tvgids.tv, day=13. No data!
(END)
hikavdh commented 9 years ago

This is weird. It means no real error has been seen. But somehow one of the threads is blocked and only some 25% of the details have been fetched. Try removing rtlz, It doesn't exist anymore on tvgids.tv. Maybe that is the culprit, blocking the tvgids.tv thread, so no details are fetched from there either. Let me know if removing rtlz helps. Then I know where to start looking.

TD-er commented 9 years ago

I removed it yesterday and this morning the fetch completed successful. See the logs here: http://xmltv.td-er.nl/Github/xmltv_config_TD-er_20150722.tgz As you can see, the program_cache was still only about 3.3 MB. Maybe I should run again with a clean program cache and see what happens?

Maybe worth noting that I only run with 4 slowdays?

hikavdh commented 9 years ago

That explains the present small size. You generally then do not fetch details from tvgids.tv. and 4/14 of little over 9 is about 3.

TD-er commented 9 years ago

I recently changed the number of slowdays to 4, when I was experiencing issues. But that was not the real (only?) issue, apparently.

Anyway, the fetching of TV-gids data was again successful this morning, in one run. So it looks like the non-existing entry of RTLZ may indeed have caused these issues.

TD-er commented 9 years ago

It is still not 'fixed', this morning the fetch failed again. I saved my logs here: http://xmltv.td-er.nl/Github/xmltv_config_TD-er_20150724.tgz

The program parameters are:

python2 /home/gijs/xmltv/tvgrabnlpy-beta-2.1.10-p20150721/tv_grab_nl.py --output /home/gijs/xmltv/tv-20150724.xmltv --compat --slowdays 4
hikavdh commented 9 years ago

I don't know if that is the cause, but I notice failures on npo.nl on all the logs which hung. This comes from the day switch. Run your script later after 5 and see if it still gives problems.

TD-er commented 9 years ago

This morning, it started at 5:10 am and finished in one run. Let's see a few more runs before drawing conclusions :) But it looks promising.

hikavdh commented 9 years ago

Try this fix, but then on the old time: https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.10-p20150726

TD-er commented 9 years ago

The last run this morning (started at 5:10 am, build july 21) finished without issues. For tomorrow, build tvgrabnlpy-beta-2.1.10-p20150726 is scheduled to run at 4:10 am.

TD-er commented 9 years ago

The last two runs with the new version (0726) ran fine and started at 4:10 am:

-rw-rw-r-- 1 gijs gijs  375255 Jul 28 04:34 tv-20150728.xmltv.gz
-rw-rw-r-- 1 gijs gijs  383043 Jul 29 04:32 tv-20150729.xmltv.gz
hikavdh commented 9 years ago

Great! I don't know exactly what went wrong, but I found some inconsistencies in the tvgids.tv thread on the handling of complete empty channels that also were in the npo.nl thread (I used tvgids.tv partially as a template for that) and it seems to work. Probably those channels kept waiting for the data. You now might want to move back to after 5. Most sites start the day switch at 4, but that takes some time, so you can get empty pages till about 5. Between 12 and 4 the script fails because of wrong dates. I'm thinking about maybe catching that, but that will be in 2.2. Hika

TD-er commented 9 years ago

When using --slowdays 14, the process hangs again and doesn't seem to complete. I also see in the log that ARD is not available on one of the sites used.

 No data for channel:ARD on tvgids.tv

So perhaps this is only an issue when using more than 4 slowdays?

I've uploaded the used .xmltv directory here: http://xmltv.td-er.nl/Github/xmltv_config_TD-er_20150801.tgz

In the mean time, I will try to run without ARD, to see whether that is the cause.

hikavdh commented 9 years ago

That was my assumption and I thought with the latest beta I had fixed that. The reason there is no data is the time. At about 4 they start the day switch and obviously you caught that channel in the middle of the switch.

With slowdays = 4 or lower normally no detail data is fetched from tvgids.tv. But the base pages are still fetched and that failure message should still come. Actually I saw it when testing. So I possibly have to look at the detail fetch. But they are not datetime related?

To explain: The 5 sources and all the channels run independently. The sources do all the base fetches and signal the channels when they are ready. Then the channels do the merging and when they have all the base basepages merged, they make a list of details to fetch. The two sources then process those lists and send the result back to the channels. When a channel has all details back (as success or failure) they prepare their output and quit. When all channels are ready the detail sources quit. When all sources have quit the main process sends out the output and starts cleaning up. It there somehow goes wrong. The two detail sources only quit when all channels say they are ready and the channels only finish if they have all requests confirmed. So if a confirmation somehow gets lost, you have a hang. I first thought it was in the basepages, but it now looks like it's in the detailpages.

I think I will set a timer on the detail sources to quit after 30 min idle and to signal all remaining channels to finish up.

In the mean time you can also set slowdays only for one channel: If you look at the ARD line:

ARD;4;9;ard;;ard;2;39131/ard-nl.jpg

The first filled in item after 1(name=ARD) and 2(group=4) is the ID, so '9'

[Channel 9]
slowdays = 4

To speed things up, you can also do that for the less interesting channels. I have those on slowdays = 1 or even to fast = True

Hika

hikavdh commented 9 years ago

Try this one: https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.10-p20150801 If you see:

Channel %s seems to be waiting for %s lost detail requests from %s.
Setting it to zero

or

source: %s died.
So we stop waiting for the pending details for channel

in the log. Please let me know.

Hika

hikavdh commented 9 years ago

Oh, and I forget. I checked; tvgids.tv only has 4 days for the German channels. And that while teveblad.be has a week?

TD-er commented 9 years ago

Yesterday's version of the grabber finished without issues.

The words 'zero' and 'died' did not occur in the logs yet.

hikavdh commented 9 years ago

It seems tvgids.tv now only has two days left for a lot of channels. And I did have a hang the last two nights. I hadn't updated jet so I have no log. I'm running it now.

hikavdh commented 9 years ago

I think now the hang is solved!

hikavdh commented 9 years ago

The last few days we encountered a freeze that was coursed by bad thread interaction. Possibly this also coursed the earlier freezes. It was an unpredictable event. We now think we have solved that ocurence in https://github.com/tvgrabbers/tvgrabnlpy/releases/tag/beta-2.1.10-p20150811

Hika