tjoskar / script.episodeHunter

Kodi (aka. XBMC) plugin for Episodehunter
GNU General Public License v2.0
12 stars 0 forks source link

TV shows sync crashing with error: invalid literal for int( ) with base 10: '' #17

Closed ghost closed 9 years ago

ghost commented 9 years ago

Hello,

when syncing TV shows to episodehunter.tv using v1.1.3 (installed in OSMC via Settings>Add Ons) the sync (which by the way on my RPi B takes a long time already just to read in the list of episodes ;) crashes with aforementioned error.

System: Raspberry Pi Model B (512 MB) with OSMC v2015.07-1 (running Kodi 15.1-RC1 x32 for Raspberry Pi)

Relevant log parts:

18:32:50 88523.476562 T:1952359456  NOTICE: EpisodeHunter: {u'limits': {u'start'  : 0, u'total': 0, u'end': 0}}
18:34:12 88605.539062 T:1952359456  NOTICE: EpisodeHunter: Traceback (most recent call last):
                                              File "/home/osmc/.kodi/addons/script.episodehunter/default.py", line 36, in menu
                                                sync.Series(connection).sync()
                                              File "/home/osmc/.kodi/addons/script.episodehunter/resources/lib/sync/sync_series.py", line 30, in sync
                                                self.get_series_to_sync_upstream  ()
                                              File "/home/osmc/.kodi/addons/script.episodehunter/resources/lib/sync/sync_series.py", line 71, in get_series_to_sync_upstream
                                                if not self.is_marked_as_watched  _on_eh(show.tvdb_id, e.season, e.episode) and e.plays > 0
                                              File "/home/osmc/.kodi/addons/script.episodehunter/resources/lib/sync/sync_series.py", line 99, in is_marked_as_watched_on_eh
                                                series_id = int(series_id)
                                            ValueError: invalid literal for int(  ) with base 10: ''
18:34:12 88605.539062 T:1952359456  NOTICE: EpisodeHunter: Error: invalid literal for int() with base 10: ''

I found a 'fix' (I'm not really firm in python, sorry) to change the calls to _is_marked_as_wa tched_oneh in _syncseries.py: in both _get_series_to_syncdownstream and _get_series_to_syncupstream I changed if self.is_marked_as_watched_on_eh(show.tvdb_id, e.season, e.episode) and ... to if show.tvdb_id != "" and e.season != "" and e.episode != "" and self.is_marked_as_watched_on_eh(show.tvdb_id, e.season, e.episode) and .. After recompilation of the module, the process works fine (although I'm still waiting to set the watched flag in Kodi itself, which is painfully slow). I also fear there might be a similar problem in _syncmovies.py. Strike that, movie module seems fine.

Additional information: It might have to do with TV shows unknown to the online platform, as I get some warnings for a couple of shows reading Failing to fetch seasons for TV show with id... (I checked, they aren't present at the web interface, too). UPDATE: It seems more likely that in my database are TV series without TheTVDB ids. Still, this shouldn't stop the plugin from working.

I hope this helps. Keep up the good work, I really love what you've created here!

tjoskar commented 9 years ago

Hi,

Thanks for your input and your investigation to the matter, it is really appreciated!

It seams that Kodi's database can contain imdbnumber == '' which I do not check for, instead I'm trying to convert it to a integer which throws an exception. This is however fixed now (If you are interested you can see the changes here: https://github.com/tjoskar/script.episodeHunter/pull/18, particularly this commit: https://github.com/tjoskar/script.episodeHunter/commit/a6f3d807cfd40c5ef4426d3eba0a3ed07c01846b).

I find some other bugs while I was working on this issue so a big thanks for pointing this out.

> which by the way on my RPi B takes a long time already just to read in the list of episodes

Yeah, I know.. The problem is that I read the whole Kodi database in to the memory and then parse it and then sends it to episodehunter.tv and then the server parse the data and sends a response when it's finished (all synchronized).

It could maybe be done something like this:

def sync(series):
    episodes = get_episodes_by_series(series) # New db call
    episodes_status_on_eh = get_episodes_from_eh_by_series(series) # New HTTP call
    # parse and compare
    set_episodes_as_watched_on_xbmc() # New db call
    set_episodes_as_watched_on_eh() # New HTTP call

pool = Pool(processes=5)
xbmc_series = get_series_from_xbmc() # New db call
pool.map(sync, xbmc_series)

This will cause a lot of http calls but they can be done in parallel and we don't need to have all data in the memory so we win in memory and cpu usage but pays with http calls but that is kind of a low cost.

I'm working right now on the API at the server to handle all requests async so the client can get a response much faster.

> It might have to do with TV shows unknown to the online platform

If you are trying to mark an episode as watched that doesn't exists on episodehunter.tv, that show will be added to the database and when it is, the episode will be marked as watched on your profile. – That is the plan anyway.

> ... this shouldn't stop the plugin from working

You are absolute right!

tjoskar commented 9 years ago

@zemion, I did some major changes in the code so I want to make some tests before I ship the new version.

I would appreciate if you would like to download the new version and test it out. You can download it here: https://github.com/tjoskar/script.episodeHunter/releases/tag/v1.2.0.

It will still be kind of slow but I'm working on a solution.

ghost commented 9 years ago

@tjoskar , thanks for implementing the changes that fast! Again, I really appreciate the work you've done. I've downloaded and installed v1.2.0 the moment I read your message. Install via zip-file worked fine for me. I uninstalled first, but it left the "script.episodehunter" folder behind, which was not overwritten due to the fact that the new folder is called "script.episodeHunter-1.2.0". This seemed to be no problem until I tried to perform a restart, which only worked after I renamed the folder. I assume this problem will not occur with a new install or the update. I've also started a sync instantly - it's running now, and I'm watching the log closely ;). As soon as my Pi is done scanning my movie library, I'll try and sync them with EH. But this might be tomorrow or the day after... Pi + Kodi isn't the best combination for huge libraries, I can honestly tell you. As we are on the topic of "Pi is slow", I think the problem isn't the HTTP call, but reading the whole database once into memory, although only some movies/episodes are seen, and even less of them are "out of sync", meaning the state on EH and Kodi differs. Maybe the RPC calls can be adjusted using filters similar to what is described here: http://kodi.wiki/view/JSON-RPC_API/Examples#Query_the_libraries, section "TV Shows": "filter": {"field": "playcount", "operator": "is", "value": "0"}, but with greaterthan instead of is and on episode level (should be in file xbmc_helper.py). This should save some memory and time (as the response from the RPC call should already contain only watches episodes). The other question is whether a thread pool (again, python is not my strong side) will solve the problem of marking episodes as watched, as Kodi on my Pi already has problems performing the task on it's own (interface and web interface are equally slow). I also noticed that concurrent actions to mark an episode (or season or show, this takes a lot of time ;) lead to strange behavior, as the processes seem to cancel each other out. Thus I think a thread pool might be a bad idea, but I'm willing to test it anytime. Again, your work and instant responses are much appreciated. Thank you for making a simple user very happy! ;)

ghost commented 9 years ago

Additional remarks: 1) Gathering all the data for TV shows took 52 minutes (and found 52 watched episodes). The upload took 2 seconds. I couldn't test download and marking as watched now. But you get an idea of what I was talking about - I'd gladly would trade a tenfold increase of upload time for the same dimension of saved time on data gathering ;) 2) With the following RPC you could filter out all TV Shows with no watched episodes whatsoever - saving a big chunk of memory to begin with:

result = execute_rpc(
        method='VideoLibrary.GetTVShows',
        params={
            'filter': {'field':'numwatched', 'operator':'greatherthan','value':'0'},
            'properties': ['title', 'year', 'imdbnumber', 'playcount', 'season', 'watchedepisodes']
        },
        id=1
    )

On the other hand this might break the sync in later steps, if on EH an episode is marked as watched in a show that was filtered out here.

tjoskar commented 9 years ago
the new folder is called "script.episodeHunter-1.2.0". This seemed to be no problem until I tried to perform a restart, which only worked after I renamed the folder 

Good to know. And no, there will not be a problem if one install/update it the preferred way.

Pi + Kodi isn't the best combination for huge libraries 

Tell me about it, I have the same setup.

I think the problem isn't the HTTP call, but reading the whole database once into memory 

You are absolutely right! My plan with a thread pool was to make one thread responsible for one tv show, so a thread starts, and get the data from the local db, fetching data from EH and when the show has been synced the thread can die and the garbage collector will free the memory usage. However, we can only start a few threads at a time otherwise there will be no improvement, rather the opposite and maybe this is a bad idea to start with but I think it's worth a try.

Another big improvement should be to remove the copy.deepcopy(self.xbmc_series) statement. Not only do I read the whole database into memory, I take a copy of it and I do it twice! (https://github.com/tjoskar/script.episodeHunter/blob/4d02741ff5e291ff9ec627b437e602ffd5435d9e/resources/lib/sync/sync_series.py#L61 and line 78) This is bad! You will never notice if you have a normal media-center with a few GB ram and a fairly good CPU but on a Raspberry Pi you probably do.

So as a first step I think it is a good idea to separate the two syncing steps (especially with the RPC you posted). Something like this:

marked_as_watched_on_eh = get_watched_shows_from_eh() 

unwatched_shows = get_unwatched_shows_from_xbmc() # The RPC you posted above, thanks for that  
# Compare and mark episodes as watched in xbmc/kodi 
unwatched_shows = None # Remove the reference and make GC take care of it 

watched_shows = get_watched_shows_from_xbmc() # Similar RPC  
# Compare and mark episodes as watched on episodehunter 

Furthermore, I have integrated a database (https://github.com/tjoskar/script.episodeHunter/blob/helix/resources/lib/database.py) to the addon, so we could save some information there like a timestamp of the last sync and then only sync episodes that as been added after that timestamp (it looks like shows and episodes have a property named: dateadded)

I will take a look at this, maybe under this week or weekend but as you may know episodehunter is just a side project, I have a regular job and I don't get any money from eh (rather the opposite; server cost and licens etc.). However, it's people like you that keep me going.

If you want you can always create a pull request (but don't feel obliged, you've already helped me enough with a great discussion and pointing out some bugs).

ghost commented 9 years ago

Tell me about side projects and server costs, I completely understand. If only the day had 25 hours... or 26... ;)

I'm really glad to hear about the progress you are making. I was thinking a little more about the RPC calls, and came up with a version that is a LOT faster than what your program does now: turns out you can call VideoLibrary.GetEpisodes without any tvshowid or season set, and Kodi delivers a comprehensive list of all episodes. Combine that with the filter, and you get a ready-for-use list of all watched episodes in the database. I tested it, and it returned the list in a reasonable time (approximately 13.000 episodes, 8.200 of them watched, running time ~1.5 min). The RPC I used looks like this:

{"jsonrpc":"2.0","id":1,"method":"VideoLibrary.GetEpisodes","params":{"filter":{"field":"playcount","operator":"greaterthan","value":"0"},"properties":%20["title","showtitle","tvshowid","playcount"]}}

The requested fields can be tweaked, of course. The same RPC can give you unwatched episodes etc. You will still need to gather additional information in case something needs to be uploaded (like currently unknown shows on EH), but in that case I assume multi-threading might actually make sense. Also, I noticed that working with lists in python doesn't seem to be as slow as getting data out of Kodi. BTW, is it possible to pull the watched episodes from EH while in parallel gathering information in Kodi?

Another big improvement should be to remove the copy.deepcopy(self.xbmc_series) statement. Not only do I read the whole database into memory, I take a copy of it and I do it twice!

I noticed both statements, and I absolutely agree that these are probably the most memory-hungry calls of all EH scripts. On the other hand, those deep copies wouldn't work well with different RPCs anyways, so I assume the speedup will be noticeable, especially if all three lists contain only the data needed for the task at hand. Maybe you can generate both lists (watched in Kodi, watched on EH), and use something similar to what's described at http://stackoverflow.com/a/6486513 to calculate only the differences (e.g. not marked at EH but in Kodi and vice versa, which then can be used to set the status both on EH and in Kodi)?

so we could save some information there like a timestamp of the last sync and then only sync episodes that as been added after that timestamp (it looks like shows and episodes have a property named: dateadded)

Well, it would probably be better to watch for lastplayed on episode level, as there might be episodes that already had been added during the last sync but hadn't been watched then. But this depends on how your database structure looks like, what exactly is stored in there and how the communication with EH is established. But in general it's a great idea to track the work already done, so it isn't done twice.

If you want you can always create a pull request (but don't feel obliged, you've already helped me enough with a great discussion and pointing out some bugs).

I really would like to, but then again: Python is not my native language ;) And my time is unfortunately very limited at the moment, too, even if it doesn't look like that on github. But whenever you need an alpha/beta/gamma tester or I can help you out with small snippets, I'm glad to help. I also would be very happy if at some point I could create an application for movie management on my own, preferably with connection to Kodi, tinyMediaManager (which I use to circumvent the Kodi scrapers), EH (if you agree, of course) and so forth. Until then I will stick to the setup I have and report bugs ;) And as I know how much time it takes to bring a project to production status, I really would like to say again: Thanks for your great work, it's deeply admired!

P.S.: If you ever should be interested in internationalizing EH, I could gladly help you out with translating the interface (both online and plugin).

tjoskar commented 8 years ago

Hi @zemion! Sorry for my late response. I have been traveling and working on rewriting the serverside of episodehunter.

I just want you to know that my next task is to fix these (performance) issues. I hope that you are still using the add-on and are still up to be a test person :)