mwichary / twitter-export-image-fill

A script to download (backup locally) all the images accompanying your tweets
The Unlicense
313 stars 22 forks source link

New archive, new format. #11

Open Zefling opened 5 years ago

Zefling commented 5 years ago

The new archive contains all medias (except Youtube videos) and avatar, without interface. This project don't works with the new format.

I make a project to read the recreate the interface : https://git.ikilote.net/angular/twitter-archive I will search a solution for download avatar and created a avatar.js

kfogel commented 5 years ago

Hey, @mwichary. I also found (as of today), that Twitter is using a new archive format. For one thing, there's no index.html anywhere in the download. The new downloads include a README.txt that says:

This archive consists of machine-readable JSON files containing information associated with your account. We’ve included the information we believe is most relevant and useful to you, including your profile information, your Tweets, your DMs, your Moments, your media (images, videos and GIFs you’ve attached to Tweets, DMs, or Moments), a list of your followers, a list of accounts following you, your address book, Lists that you’ve created, are a member of, or are subscribed to, interest and demographic information that we have inferred about you, information about ads that you’ve seen or engaged with on Twitter, and more.

So maybe some stuff that used to not be included now is included?

Also, I'm not sure this is related to the new format, but I had to do this patch in order to get the script to even get to the point of looking for index.html. Not only does the native archive not have an img/avatars/ directory, it doesn't even have an img/ directory! Maybe it used to? Anyway, this change is probably a good idea either way:

--- twitter-export-image-fill.py
+++ twitter-export-image-fill.py
@@ -111,7 +111,7 @@ def load_tweet_index():

 def make_directory_if_needed(directory_path):
   if not os.path.isdir(directory_path):
-    os.mkdir(directory_path)
+    os.makedirs(directory_path)

 def is_retweet(tweet):
mwichary commented 5 years ago

@kfogel Thanks for letting me know. I’ll try to check out the new format.

Zefling commented 5 years ago

Now, Twitter offers 2 archives. One with an HTML formatting without any media and the other with only in Json with the media (pictures and movies). The link I give is for the second format which allows to find a data formatting. Example with data of my account: http://twitter.ikilote.net/tweets

kfogel commented 5 years ago

Interesting observation from @Zefling. FWIW, Twitter only offered me the second kind of archive -- there are no .html files in it, but it does have media files. There was no point at which I chose this: it was just the default download offered to me, and there was no option for any other kind.

mwichary commented 5 years ago

I remember in the old UI, there was indeed a different place to download the old archive (with an HTML menu, one supporter by this script), and another place to download the new archive (which appeared after I wrote my script, and which I assumed to be GDRP-related). It’s possible that the new UI only offers the new style. I’m typing from the road, so cannot check at the moment. (And yes, thanks, @zefling! Didn’t originally noticed your message.)

On Sun, Aug 18, 2019 at 12:39, Karl Fogel notifications@github.com wrote:

Interesting observation from @Zefling https://github.com/Zefling. FWIW, Twitter only offered me the second kind of archive -- there are no .html files in it, but it does have media files. There was no point at which I chose this: it was just the default download offered to me, and there was no option for any other kind.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/mwichary/twitter-export-image-fill/issues/11?email_source=notifications&email_token=AAPXKKISI276JIKID4MD3Q3QFGQQFA5CNFSM4GBWITI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RGVVY#issuecomment-522349271, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPXKKNREQWFBQGNW5PV7DTQFGQQFANCNFSM4GBWITIQ .

keithrbennett commented 1 year ago

Hi, I'm wondering if there may be any plans to update this script to run with the current format? Or if anyone knows of other tools for this?

Some things I noticed about the new (November 2022) format:

kfogel commented 1 year ago

...Or if anyone knows of other tools for this?

@keithrbennett You may want to also look at https://github.com/timhutton/twitter-archive-parser/, and at the long list of other tools given at the end of its README.md.