jensb89 / Netflix-to-Trakt-Import

Synchronize/Import the viewing history from Netflix to trakt.tv
90 stars 18 forks source link

time data error from ViewingActivity #30

Open KalleIstKool opened 1 year ago

KalleIstKool commented 1 year ago

i try to import my data from the ViewingActivity.cvs so the one which you get by request your data from netflix after you already canceld. I coulnd't find another .csv file in the files from netflix which contains all the viewingactivity.

it is formated like: USERNAME,2022-10-17 23:42:00,00:43:26,,DAHMER: Monster: The Jeffrey Dahmer Story: The Good Boy Box (Episode 4),,Netflix Windows App - Cadmium Windows Mobile,00:41:19,Not latest view,DE (Germany) USERNAME,2022-10-17 23:19:14,00:22:36,,DAHMER: Monster: The Jeffrey Dahmer Story: Doin' A Dahmer (Episode 3),,Netflix Windows App - Cadmium Windows Mobile,00:50:12,00:50:12,DE (Germany)

the programm runs like this: C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master>python netflix2trakt.py Traceback (most recent call last): File "C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master\NetflixTvShow.py", line 158, in addWatchedDate time = datetime.datetime.strptime(watchedDate + " 20:15", config.CSV_DATETIME_FORMAT + " %H:%M") File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64qbz5n2kfra8p0\lib_strptime.py", line 568, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64qbz5n2kfra8p0\lib_strptime.py", line 352, in _strptime raise ValueError("unconverted data remains: %s" % ValueError: unconverted data remains: :34 20:15

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master\netflix2trakt.py", line 40, in netflixHistory.addEntry(entry, watchedAt) File "C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master\NetflixTvShow.py", line 95, in addEntry self.addMovieEntry(entryTitle, entryDate) File "C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master\NetflixTvShow.py", line 123, in addMovieEntry movie.addWatchedDate(watchedDate) File "C:\Users\x\Desktop\watchhistory\Netflix-to-Trakt-Import-master\NetflixTvShow.py", line 162, in addWatchedDate time = datetime.datetime.strptime(watchedDate + " 20:15", "%m.%d.%y %H:%M") File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64qbz5n2kfra8p0\lib_strptime.py", line 568, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2800.0_x64qbz5n2kfra8p0\lib_strptime.py", line 349, in _strptime raise ValueError("time data %r does not match format %r" % ValueError: time data '2023.02.14.21.31.34 20:15' does not match format '%m.%d.%y %H:%M'

I've tried "%Y-%m-%d", vice versa, with and without capital Y and with dots istead of -

jensb89 commented 1 year ago

Just a quick answer as I'm currently only on mobile. It seems the csv from you has more entries than the normal one you get when you can still login (not inactive). Yours has username and device information.

The script currently does not support that. Some regex expressions need to be adapted for that. I guess one could implement a option for that kind of csv files as well. Should not be super hard to do, maybe I find some time at the weekend

Pleasant-exe commented 1 year ago

I got this error too, i really really hope it gets fixed as im slowly syncing and its painful.

Pleasant-exe commented 1 year ago

Just a quick answer as I'm currently only on mobile. It seems the csv from you has more entries than the normal one you get when you can still login (not inactive). Yours has username and device information.

The script currently does not support that. Some regex expressions need to be adapted for that. I guess one could implement a option for that kind of csv files as well. Should not be super hard to do, maybe I find some time at the weekend

Could you please? :p

jensb89 commented 1 year ago

If someone can provide me with a sample csv from an export of an inactive account that contains a few entries (both series and movies would be good) then I can try to implement a version for that. I currently don't have an inactive account

KalleIstKool commented 1 year ago

ok i just randomly took some movies and tv shows throughout my history. Hope it helps: USERNAME,2017-06-06 21:09:06,00:26:47,,F is for Family: Season 2: Night Shift (Episode 4),,Mobile,00:25:38,00:25:38,DE (Germany) USERNAME,2017-06-06 20:35:38,00:26:19,,F is for Family: Season 2: The Liar's Club (Episode 3),,Mobile,00:25:47,00:25:47,DE (Germany) USERNAME,2017-06-05 22:08:13,00:25:25,,F is for Family: Season 2: A Girl Named Sue (Episode 2),,Netflix Windows App - Cadmium Windows Mobile,00:25:25,00:25:25,DE (Germany) USERNAME,2017-06-05 21:56:10,00:03:48,,F is for Family: Season 2: Heavy Sledding (Episode 1),,Netflix Windows App - Cadmium Windows Mobile,00:25:37,00:25:37,DE (Germany) USERNAME,2017-06-05 10:43:24,00:22:02,,F is for Family: Season 2: Heavy Sledding (Episode 1),,Netflix Windows App - Cadmium Windows Mobile,00:21:51,Not latest view,DE (Germany) USERNAME,2017-07-30 21:45:41,02:14:59,,Man of Steel,,Samsung 2014 MStar DTV,02:15:05,02:15:05,DE (Germany) USERNAME,2017-10-27 13:07:38,00:00:04,,Transformers: Revenge of the Fallen,,Samsung 2014 MStar DTV,00:00:04,00:00:04,DE (Germany) USERNAME,2017-10-27 10:32:01,02:22:13,Autoplayed: user action: Unspecified; ,Transformers: Revenge of the Fallen,,Samsung 2014 MStar DTV,02:23:17,Not latest view,DE (Germany) USERNAME,2017-10-26 17:04:03,00:01:05,,Transformers: Revenge of the Fallen,,Samsung 2014 MStar DTV,00:01:05,Not latest view,DE (Germany)

jensb89 commented 1 year ago

Thanks @KalleIstKool , I try to implement a version for that as soon as possible.

jensb89 commented 12 months ago

Follow-up question: Has the file a header (like Username, date, ...) with a description of the fields? That would be helpful to parse the data :)

KalleIstKool commented 12 months ago

thank you so much for your help :) it actually has a header: Profile Name,Start Time,Duration,Attributes,Title,Supplemental Video Type,Device Type,Bookmark,Latest Bookmark,Country

jensb89 commented 12 months ago

Ok, so here is a sample code that would read the data from the file:

import csv 

with open("histTestInactive.csv", mode="r", encoding="utf-8") as csvFile:
    # Make sure the file has a header "Title, Date" (first line)
    csvReader = csv.DictReader(csvFile, fieldnames=("Profile", "Date", "Duration", "Attributes", "Title", "SupVideoType", "Device", "DurationPlayed", "DurationLastView", "Country"))

    line_count = 0
    for row in csvReader:
        if line_count == 0:
            # Skip Header
            line_count += 1
            continue

        entry = row["Title"]
        watchedAt = row["Date"][0:10]

        print("{}:{}".format(watchedAt,entry))

For testing replace the code in netflix2trakt.py that starts at the csvReader line with the above code (line 34).

A few things to note:

  1. The movie "Transformers:Revenge of the Fallen" appears 3 times in the log provided by you. From the parameters, it seems the played duration for some is only a few seconds up to a minute. Here it seems its played 01:05min on one day and the rest of the whole movie the next day. Would we need to filter these out? Otherwise, it would appear in Trakt 2-3 times (currently same day is filtered out already). So the question would be what is the lower threshold so that it will be taken into account? 2minutes? What if half is watched one day, and half the other day. This would need to be implemented here as well.

  2. This csv contains date and time. The usual Netflix export only had a date and my code adds "20:15" as the default time. In this sample code I stripped the time info so it works with the current script, it would be better to adjust the script for 2 versions though or at least identify the date time format and strip it from there. Maybe a config option would be useful (is_inactive_account_export = True), also passing the time info directly should be considered (after solving the issues with the small duration plays).

However, the above code should already work for most use cases, here is the output from the script when you replace the code in netflix2trakt (set CSV_DATETIME_FORMAT = "%Y-%m-%d" in the config):

Found Tmdb ID (by name) for F is for Family : Season 2: Night Shift (Episode 4) (TMDB ID 1320582 - Night Shift Episode 4)
Found Tmdb ID (by name) for F is for Family : Season 2: The Liar's Club (Episode 3) (TMDB ID 1320581 - The Liar's Club Episode 3)
Found Tmdb ID (by name) for F is for Family : Season 2: A Girl Named Sue (Episode 2) (TMDB ID 1320580 - A Girl Named Sue Episode 2)
Found Tmdb ID (by name) for F is for Family : Season 2: Heavy Sledding (Episode 1) (TMDB ID 1320579 - Heavy Sledding Episode 1)
No Tmdb ID found for Transformers : Season 1: Revenge of the Fallen
Found Tmdb ID (by name) for DAHMER : Season 1: The Jeffrey Dahmer Story: The Good Boy Box (Episode 4) (TMDB ID 3959087 - The Good Boy Box Episode 4)
Found Tmdb ID (by name) for DAHMER : Season 1: The Jeffrey Dahmer Story: Doin' A Dahmer (Episode 3) (TMDB ID 3959086 - Doin' A Dahmer Episode 3)
Found movie Man of Steel : Man of Steel (49521)
Found movie Transformers: Revenge of the Fallen : Transformers: Revenge of the Fallen (8373)
Adding episodes to trakt: 4 episodes from F is for Family season 2
Adding episodes to trakt: 2 episodes from DAHMER season 1
Adding movie to trakt: Man of Steel
Adding movie to trakt: Transformers: Revenge of the Fallen
Adding movie to trakt: Transformers: Revenge of the Fallen

We can see that Transformers is added twice to trakt due to the different dates. Maybe a filter to skip everything below 5min would be useful. However, one could also start a movie on one day and watch half of it and finish the next half the next day. The previous Netflix log would just export 1 row for it, here it would be two. These are harder to filter out and definitely require more code. Good ideas are welcome :)

I will integrate the above code into the main repo at some point as well with a config option labeled as experimental. It should already work for a lot of the entries. Feel free to try it already and give some feedback :)

Pleasant-exe commented 12 months ago

ViewingActivity.csv Here is a copy of the file, cropped down ofc

jensb89 commented 12 months ago

@Pleasant-exe Thanks, above mentioned code (previous comment) should work for you as well. In your csv we have the same problem of multiple starts of one episode. By omitting the time info as done in my sample code and setting 20:15 as the default it should work for your file because the different starts are at the same day and would be ignored.

@Pleasant-exe + @KalleIstKool : You can also try the script and set "TRAKT_API_DRY_RUN = True" in the config. This way you will see more of what would be added to Trakt without actually sending it.

Pleasant-exe commented 11 months ago

i noticed you pushed an update, is that this?

Pleasant-exe commented 11 months ago

hey, it lists all shows but ends with

** Skipping Trakt sync **
* 0 episodes and 0 movies added to Trakt history
Pleasant-exe commented 11 months ago

I updated to the latest version with your code above and got the following

Traceback (most recent call last):
  File "/home/pleasant/Netflix-to-Trakt-Import/netflix2trakt.py", line 52, in <module>
    for show in netflixHistory.shows:
NameError: name 'netflixHistory' is not defined. Did you mean: 'NetflixTvHistory'?`

Without modify

`Traceback (most recent call last):
  File "/home/pleasant/Netflix-to-Trakt-Import/NetflixTvShow.py", line 215, in addWatchedDate
    time = datetime.datetime.strptime(watchedDate + " 20:15", config.CSV_DATETIME_FORMAT + " %H:%M")
  File "/usr/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.10/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '07 20:15' does not match format '%d.%m.%Y %H:%M'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pleasant/Netflix-to-Trakt-Import/netflix2trakt.py", line 48, in <module>
    netflixHistory.addEntry(entry, watchedAt)
  File "/home/pleasant/Netflix-to-Trakt-Import/NetflixTvShow.py", line 119, in addEntry
    self.addMovieEntry(entryTitle, entryDate)
  File "/home/pleasant/Netflix-to-Trakt-Import/NetflixTvShow.py", line 175, in addMovieEntry
    movie.addWatchedDate(watchedDate)
  File "/home/pleasant/Netflix-to-Trakt-Import/NetflixTvShow.py", line 219, in addWatchedDate
    time = datetime.datetime.strptime(watchedDate + " 20:15", "%m.%d.%y %H:%M")
  File "/usr/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.10/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '07 20:15' does not match format '%m.%d.%y %H:%M'

With my previous comment it was the older version with the code you mentioned above .

jensb89 commented 11 months ago

The latest update was regarding a different issue.

The "skipping track sync" output indicates that you set the TRAKT_DRY_RUN to true. This is the expected behavior then, it's for testing purposes without the need to send the final data. If you set it to false it should already work.

I will look into your errors with the new code and will link related commits here. I'm quite busy currently, so it might take a few days, but there will be a option in the config eventually :)