Open billbeans opened 2 months ago
@billbeans user_media
is api call to twitter to get list of media – list of links to photos and videos. Its reason why use see many log in terminal.
There are no real media download in twscrape
now, because no request about it before.
You can download media with this simple script now:
import asyncio
import os
import httpx
from twscrape import API
async def download_file(client: httpx.AsyncClient, url: str, outdir: str):
filename = url.split("/")[-1].split("?")[0]
outpath = os.path.join(outdir, filename)
async with client.stream("GET", url) as resp:
with open(outpath, "wb") as f:
async for chunk in resp.aiter_bytes():
f.write(chunk)
async def load_user_media(api: API, user_id: int, outdir: str):
os.makedirs(outdir, exist_ok=True)
all_photos = []
all_videos = []
async for doc in api.user_media(user_id):
all_photos.extend([x.url for x in doc.media.photos])
for video in doc.media.videos:
variant = sorted(video.variants, key=lambda x: x.bitrate)[-1]
all_videos.append(variant.url)
async with httpx.AsyncClient() as client:
await asyncio.gather(
*[download_file(client, url, outdir) for url in all_photos],
*[download_file(client, url, outdir) for url in all_videos],
)
async def main():
api = API()
await load_user_media(api, 2244994945, "output")
if __name__ == "__main__":
asyncio.run(main())
Maybe I'm a bit confused about what this software does, but can it actually grab a user's uploaded media (jpg, mp4) from their tweets and download them? I ran
user_media
on a profile, and I just got a bunch of stdout in my terminal. I saved that output to a text file and had a hell of a time grepping the links out of it to makewget
work, and even then, it didn't grab all of the media from the profile I wanted scraped