xbmc / metadata.tvshows.themoviedb.org.python

themoviedb.org TV Show scraper in Python for Kodi 18 (Leia) or later.
GNU General Public License v3.0
19 stars 26 forks source link

Changing the language after scraping has no effect #98

Closed Animajosser closed 1 year ago

Animajosser commented 1 year ago

When changing the language of content after the tv show has already been scraped, rescraping "downloads" the metadata using the same language. However it doesn't redownload anything, but uses the cached data, which is in the old language. This probably also gives issues with different settings.

Reproduce:

  1. Scrape tv show with default settings.
  2. Change language of the content to something else like fr-FR.
  3. Manage -> Remove the tv show.
  4. Update Library.

Result: The tv show content is still in en-US.

Expected: The cache should be invalidated and the metadata should be redownloaded in fr-FR.

Workaround: Remove the \<tmdb id>.pickle file in temp/scrapers/metadata.tvshows.themoviedb.org.python.

pkscout commented 1 year ago

The cache is set to expire in 15 minutes. If you attempt to scrape before the cache expires, the behavior you described is expected. The cache file needs to be held long enough to ensure the scrape finishes, so any shorter cache length can cause problems with larger shows.

Please note also that once setup a source has its own settings. You must change the setting in the source, not the addon. Because each source has its own settings, there is no reasonable way to track when changes are made to the source settings, so there is no way to know whether to try and delete the cache file or not.

cunlem commented 1 year ago

So this is not considered a valid bug to fix? Can't the whole cache just be purged before the scrape starts? And if not then maybe store configuration together with cached data and compare with the current before using?

pkscout commented 1 year ago

Because of way Kodi does scraping, there is no single scrape. There are four different times Kodi calls the scraper, and only a small piece of data is passed each time. You can't just purge the cache before the scrape because doing so would purge it four times and cause everything to fail. The alternative is to not cache anything, which would require a complete rewrite of the scraper, duplicate API calls four times, and likely increase scrape times four fold. I can look at storing the configuration but there are three different possible entry points where that comparison would have to be done, and that will complicate things significantly. This is an edge case where you try and scrape the same some very quickly back to back with a relatively simple workaround. Given the current priorities, it is not a priority to address. The code is available, and we do accept PRs for review, so if someone wants to do the work before I have time to look at it again, I can certainly review and potentially approve the changes.

Animajosser commented 1 year ago

I actually thought about accepting it, but now that I see some conversation about it I think I could find a way. I think it should be possible to create a way to only change the code in one place (or refactoring such that this would require only one check). I also think that storing the settings could cause problems with updates adding options and such. However compiling the settings into a string, generating a hash of it and storing that in the cache should be foolproof. I'll see if I can find the time to make a PR.

cunlem commented 1 year ago

Thank you for your replies.

@pkscout

This is an edge case where you try and scrape the same some very quickly back to back with a relatively simple workaround.

It's not really that uncommon to realize you misconfigured something after you scrape.

@Animajosser

I also think that storing the settings could cause problems with updates adding options and such

Sure but the only possible consequence is that value from cache is going to be ignored so I think we're good.