timhutton / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways
GNU General Public License v3.0
2.42k stars 110 forks source link

Feature request: Retrieve ALT-text for images #20

Open beadsland opened 2 years ago

beadsland commented 2 years ago

This data appears to be omitted from archive entirely.

Side note: Another Mastodoner has offered up an online tool to parse the .js file and grap down ALTs... but there are currently UI issues, however, that have thus far foiled attempts to use it.

timhutton commented 2 years ago

@beadsland I searched for ALT tags in my archive and I agree that they seem to be completely missing. Will keep this as a feature request.

cooljeanius commented 2 years ago

This link in the README worked for me: https://archive.alt-text.org/

beadsland commented 2 years ago

Yes, have been working with developer of alt-text.org on UI and error conditions there.

It now seems to be working reliably with extracted tweets.js or tweet-media.js generated by recommended script. So dropping the output from there someplace that parser.py can get to ought to be sufficient to incorporate into the output.md.

hkolbeck commented 2 years ago

Hi, I'm the author of https://archive.alt-text.org. Would you be willing to provide a way to fold the result of my tool in where possible? The output is a JSON file:

[
  {
     "tweet_id": "...",
     "media_key": "...",
     "media_url": "...",
      "alt_text": "..."
  },
  ...
]
timhutton commented 2 years ago

Hello! Yes, perfect.

lenaschimmel commented 2 years ago

I just tested the Twitter Guest API implementation by @press-rouch and noticed that the result of get_tweet also contains the ALT-text for contained media at parse(likes.json)[tweet_id].extended_entities.media[index].ext_alt_text:

Bildschirmfoto 2022-11-19 um 14 36 35

hkolbeck commented 2 years ago

If at all possible, I think it would be useful to allow folding in the https://archive.alt-text.org result. I'm not sure how the Twitter Guest API works, but my reason for going with a site was to allow folks with no oauth or terminal knowledge to fetch their archives' alt text.

lenaschimmel commented 2 years ago

The PR #97 is merged into the branch downloadtweets now and provides that basic functionality (which is still evolving) to re-download the tweets which contain media, and thus might contain alt-text. The downloaded tweet is not used in any way yet. See that PR for the overall idea on how to proceed.

@hkolbeck I like that your website provides an alternative for users without terminal knowledge, etc. Luckily, our tool does not need oauth or anything like that, since it operates without login or connection to the account. I'm not sure if / how you would like to integrate the result from your website though, since that obviously would require terminal knowledge from the user again.

hkolbeck commented 2 years ago

@lenaschimmel Interesting, I didn't realize you were sharing your bearer token. I had been assuming that this would require folks to know how to get auth keys for the Twitter API. If this can fetch them natively then there's no real reason to fold in my results. Thanks!

cooljeanius commented 1 month ago

@lenaschimmel Interesting, I didn't realize you were sharing your bearer token.

Wait where was that part mentioned?