prinsss / twitter-web-exporter

Export tweets, bookmarks, lists and much more from Twitter(X) web app. (推文/书签/收藏/列表导出工具)
MIT License
819 stars 57 forks source link

[Function Request] Export ALT ("ext_alt_text" in metadata) independently. #23

Closed dian334 closed 3 months ago

dian334 commented 4 months ago

ALT is important data for me because some twitter users use it like hidden tweets. But manual extraction of "ext_alt_text" from metadata takes long time because of my low computer skill.... So i want some functions to export "ext_alt_text" as independent item like "full_text" and "favorite_count".

prinsss commented 4 months ago

Thank you for your suggestion! Could you explain more about "as independent item"?

{
  "id": "1775524882375233554",
  "full_text": "Lorem ipsum",
  // Did you mean adding a new field like this?
  "ext_alt_text": "Dolor sit amet",
  "screen_name": "NASAWebb",
  "name": "NASA Webb Telescope",
}

I understand the importance of the ALT text and I'm happy to implement this. But an ALT text is associated with a media item, not the tweet itself. A tweet with 4 images attached could have 4 ALT texts.

Thus, I think it would be better to add it to the media object rather than to the top-level tweet object. Here is an example of how it could look:

{
  "id": "1775524882375233554",
  "created_at": "2024-04-03 22:04:54 +08:00",
  "full_text": "All of these white dots are stars.\n\n12 million light-years away, starburst galaxy M82 sprouts new stars 10 times faster than our Milky Way. Using its infrared vision, Webb peered through dust and gas to reveal never-before-seen detail at the heart of M82: https://t.co/F31oQVUCcH https://t.co/3On66QJ5B1",
  "media": [
    {
      "type": "photo",
      "url": "https://t.co/3On66QJ5B1",
      "thumbnail": "https://pbs.twimg.com/media/GKPueCAXEAAkIRF?format=jpg&name=thumb",
      "original": "https://pbs.twimg.com/media/GKPueCAXEAAkIRF?format=jpg&name=orig",
      // Begin of the new field
      "ext_alt_text": "A section of Messier 82 as imaged by the Webb Telescope. An edge-on spiral starburst galaxy with a bright white, glowing core, set against the black background of space. A white band of the edge-on disk extends from lower left to upper right. Dark brown tendrils of dust are heavily threaded through this band. Many white points in various sizes (stars or star clusters) are scattered across the image, but are most heavily concentrated toward the center. Just outside the bright core and the brown tendrils of dust, the surrounding mass of stars appears to glow faintly blueish-purple."
      // End of the new field
    }
  ],
  "screen_name": "NASAWebb",
  "name": "NASA Webb Telescope",
  "profile_image_url": "https://pbs.twimg.com/profile_images/1767989888916299776/hFYvpxZM_normal.jpg",
  "in_reply_to": null,
  "retweeted_status": null,
  "quoted_status": null,
  "favorite_count": 3830,
  "retweet_count": 715,
  "bookmark_count": 148,
  "quote_count": 32,
  "reply_count": 77,
  "views_count": 438715,
  "favorited": false,
  "retweeted": false,
  "bookmarked": false,
  "url": "https://twitter.com/NASAWebb/status/1775524882375233554"
}

What do you think? I'm open to suggestions and feedback.

dian334 commented 4 months ago

Sorry for ambiguous expression, I meant by "independent" that the ALT information is included in export data even when "Include all metadata" checkbox is off.

But an ALT text is associated with a media item, not the tweet itself. A tweet with 4 images attached could have 4 ALT texts.

I understood the classification of ALT as information by reading your comment. And I agree with your thought about the appropriate position of "ext_alt_text" object. (your example)

prinsss commented 3 months ago

The latest nightly build has implemented this feature. The ALT texts are now exported along with image URLs.

Please give it a try and let me know if it works for you.

https://github.com/prinsss/twitter-web-exporter/releases/download/nightly/twitter-web-exporter.user.js