tvgrabbers / tvgrabnlpy

Deze versie is deprecated zie: tvgrabpyAPI
https://github.com/tvgrabbers/tvgrabpyAPI
GNU General Public License v2.0
27 stars 8 forks source link

Incorrect data for channel 2BE #31

Closed JanCeuleers closed 9 years ago

JanCeuleers commented 9 years ago

I noticed that data for Belgian channel 2BE is extremely unreliable (i.e. inaccurate). I've made several recordings where the recorded program is different from the one in its metadata (in other words: the program data was incorrect to begin with). Another example is coming up this evening: on June 17th 2015 2BE will be broadcasting a movie called New Police Story at 20:35. Both 2BE's own website and teveblad.be agree on this. However, Myth thinks the movie will be Source Code. I'm using Version: 2.1.6-p20150510-beta.

hikavdh commented 9 years ago

Data for 2BE is retrieved from teveblad,be tvgids.nl and tvgids.tv. Obviously tvgids.nl and or tvgids.tv have wrong data. The weird thing is that as it is a Flemish channel teveblad.be should win the contest. I will look into that. You can check if prime_source is set correctly to 3:

# 2BE
[Channel 59]
prime_source = 3

The short solution is to remove tvgids.nl and tvgids.tv from its sources. So replace in your config the line:

2BE;2;59;2be;;2be;2;1064516/logo-2be.jpg

with

2BE;2;;;;2be;2;1064516/logo-2be.jpg

There is one but. The xmltvid will change from 59 to 2be. So also adjust any detail configuration on the channel and your MythTV line-up.

# 2BE
[Channel 2be]
prime_source = 3
hikavdh commented 9 years ago

Oh, and two other things. You should move to 2.1.7 or 2.1.8. They use the old single channel pages from teveblad.be, which contains a little bit more complete episode info. And removing those sources will make detail data a little less complete. But I'm not sure how much.

hikavdh commented 9 years ago

And last, but not least, If you know of good alternative sources? I'm always interested!

hikavdh commented 9 years ago

I have just tried to get 2BE for three days. It looks like I get New Police Story. So look at your prime_source setting!

  <programme start="20150617203500 +0200" stop="20150617225500 +0200" channel="59">
    <title lang="nl">New Police Story</title>
    <title lang="ch">Xin jing cha gu shi</title>
    <desc lang="nl">Actie: Politie-inspecteur Chan Kwok-Wing en zijn team worden in de val gelokt door een sadistische bende, die er een spel van maakt zo veel mogelijk agenten om het leven te brengen. Chan is de enige van zijn eenheid die de confrontatie overleeft...</desc>
    <credits>
      <director>Benny Chan</director>
      <actor>Jackie Chan</actor>
      <actor>Nicholas Tse</actor>
      <actor>Mak Bau</actor>
      <actor>Tak</actor>
      <actor>Winnie Leung</actor>
      <writer>Alan Yuen</writer>
      <composer>Tommy Wai</composer>
      <composer>Ngai Cheung</composer>
      <composer>Chi</composer>
      <composer>Kinson Tsang</composer>
      <composer>George Lee Yiu</composer>
    </credits>
    <date>2004</date>
    <category>Film</category>
    <country>CH</country>
    <rating system="kijkwijzer">
      <value>16+</value>
      <icon src="http://tvgidsassets.nl/img/kijkwijzer/16_transp.png"/>
    </rating>
    <rating system="kijkwijzer">
      <value>Geweld</value>
      <icon src="http://tvgidsassets.nl/img/kijkwijzer/geweld_transp.png"/>
    </rating>
    <rating system="kijkwijzer">
      <value>Angst</value>
      <icon src="http://tvgidsassets.nl/img/kijkwijzer/angst_transp.png"/>
    </rating>
    <rating system="kijkwijzer">
      <value>Grof</value>
      <icon src="http://tvgidsassets.nl/img/kijkwijzer/grof_transp.png"/>
    </rating>
  </programme>
hikavdh commented 9 years ago

You should also consider, that they can have changed programming at the last moment. I always do a limited grab at 1700 to catch such changes. See the new flexible grab script grab_epg.sh added on 2.1.6 stable.

JanCeuleers commented 9 years ago

Thanks for all of this. A few points of feedback:

  1. I have installed the latest version I could find (2.1.8 20150615) and re-ran my regular nightly run and this has not fixed the issue.
  2. prime_source is indeed set to 3 for channel 59
  3. I actually noticed yesterday that the 2BE schedule for yesterday evening and this evening were the same. So in Myth the Source Code movie was also on yesterday evening (but I couldn't be bothered to report it at that time).
  4. I also don't quite understand: not only is teveblad correct, so are tvgids.nl and tvgids.tv. I therefore don't quite understand your point regarding the possibility that the prime_source might affect this. It seems to me that the wrong data somehow got into the cache (either because 2BE changed their mind as to which movie to broadcast tonight or because or an issue in the grabber) and this isn't being corrected upon subsequent grabber runs.

Unless you have better advice my next intended step is to dump the cache and try again.

hikavdh commented 9 years ago

Although it is possible that it is a cache problem, it is unlikely.
To explain, timings are never used out of the cache. In short the procedure is to get all the overview pages and merge them. If no overlap is found between programs from the different sources, programs not from the prime source are thrown away and only the prime source programs are kept. Also with differences in start/stop times, the prime source timings are used.
Then the cache is examined, but the timings from the fresh fetch are kept. Also fresh data not in the cache is kept. The link to search is their ID for the detail pages (except for rtl.nl where the program constructs an id from the starttime, but it is not used for the cache), so unless tvgids.nl reuses the id on a change ... ? I could add a check on the program title, but if they reuse the id when a program moves to another time/day it wouldn't matter. The assumption is that an id links to a program not a timeslot. Then if nothing is found in the cache details are fetch, if possible from tvgids.nl, but else from tvgids.tv.
So how things end up on the wrong day? The only possibility I at first hand can think of is that the datetime of your computer is a day off. Can you upload the xml output? If you let mythfilldatabase arrange the fetch, look at the fetch script on how to preserve the xml file that is imported in mythtv.

JanCeuleers commented 9 years ago

Thanks Hika. It's not practical for me to have myth call your grabber directly because I also need to use other grabbers for other channels. The XML file is here: https://www.dropbox.com/s/qtyat269lyj2qxi/tv_grab_nl_py.out?dl=0

JanCeuleers commented 9 years ago

Oh, and I've confirmed that the date is correct on the backend (which runs the grabber). Which is to be expected because otherwise myth would not be able to record anything correctly.

hikavdh commented 9 years ago

I have taken a fast look and on second thought it looks like it could be cache corruption. The thing only is that everything coming from the cache seems correct, only the title is wrong. If you look at your output, tonight the title might be Source Code but all the details are from New Police Story. And when you look a day later your output gives New Police Story, but with details from a Lara Croft Tombraider movie. I suggest the following:

  1. rename your cache file to program_cache.bak
  2. run only 2be for 2 or three days and examine the output If the output is still mixed up we have to look further.
hikavdh commented 9 years ago

Also can you post the config you're using. I see you're using compat mode. Maybe that or something else is causing it.

hikavdh commented 9 years ago

I looked at the code; the title is coming from the cache!

hikavdh commented 9 years ago

OK, if the testrun produces the right info, you can do the following, so you do not have to fetch everything in cache again:
In the program around line 7950 in class Channel_Config(Thread): there is a function: def use_cache(self, tdict, cached): between the description line:

        # Make sure we do not overwrite fresh info with cashed info

and the line:

        if tdict['description'] > cached['description']:

add:

        if tdict['name'] != cached['name']:
            cached['name'] = tdict['name']

Be sure the number of leading spaces is the same or you'll get a syntax error. If now the in cache title doesn't match, it will use the fresh one.

JanCeuleers commented 9 years ago

I confirm that the test run produces correct output. Config file available here: https://www.dropbox.com/s/a6av50qh481gns2/tv_grab_nl_py.conf?dl=0

I think we've now established that the cache has been corrupted. I think I'd therefore rather get rid of it than to try and salvage some of it (because there may be other forms of corruption as well).

In case it's of any use to you I've uploaded the cache file to Dropbox as well: https://www.dropbox.com/s/ma7v6i9g2hykfyu/tv_grab_nl_py.cache.corrupted?dl=0

hikavdh commented 9 years ago

Good! The question of course stays how? But that we probably will never find out. The most probable is a crash between 2 weeks ago and when you first noticed it. Although that would normally either go alright or produces an unreadable cash file. Not a data swap> ??

hikavdh commented 9 years ago

Oh, and by the way. In between there is a 2.1.9 beta with a fix for the tvgids.tv detail failures! With your current version you won't get details beyond the tvgids.nl 4 days limit. See actuele info