Open Twi-Hard opened 4 years ago
Not possible at the moment. Posts/Tweets/etc without media, regardless of the site, are ignored.
Could I be so rude as to ask for this to be looked into?
I'm trying to go through the code to see what changes need to be made but it's taking a lot of effort
Edit: I managed to get this barely working by adding the following code to the start of DownloadJob.handle_directory
in gallery-dl/jobs.py
with open("./"+str(kwdict["tweet_id"])+".json", "w", encoding="utf-8") as fp:
kwdictWriteable=kwdict
kwdictWriteable["date"]="//TODO: THIS"
kwdictWriteable["author"]["date"]="//TODO: THIS"
fp.write(json.dumps(kwdictWriteable))
By "barely working" I mean it writes the contents of tweets without media to a file. This could be completely the wrong way to go about it, but it's progress
Made a slightly less jank proof of concept to be added after tdata.update(metadata)
It's not clean in the slightest and it doesn't even format the json, but it works as a temporary machine-parseable solution. I just hope @mikf can do the proper implementation for me because I can't make heads or tails with this codebase
# TEMPORARY AND JANK MEDIALESS TWEET SOLUTION
import os, copy, datetime
try: os.mkdir("gallery-dl/twitter")
except: pass
try: os.mkdir("gallery-dl/twitter/"+tdata["user"]["name"])
except: pass
tdataWriteable=copy.deepcopy(tdata)
def deepClean(obj):
for key in obj.keys():
if isinstance(obj[key], datetime.datetime):
obj[key]=obj[key].timestamp()
elif isinstance(obj[key], dict):
obj[key]=deepClean(obj[key])
return obj
open("gallery-dl/twitter/"+tdata["user"]["name"]+"/"+str(tdata["tweet_id"])+".json", "w").write(json.dumps(deepClean(tdata)))
# TEMPORARY AND JANK MEDIALESS TWEET SOLUTION
Should now be possible by enabling the text-tweets
option (https://github.com/mikf/gallery-dl/commit/724ca61f3600037cc57033891354cded10633079, https://github.com/mikf/gallery-dl/commit/b5affc62aa84847f3ac0c39eda675f3ced761a9f) and the right postprocessors
settings:
"twitter": {
"text-tweets": true,
"postprocessors": [
{
"name": "metadata",
"event": "post",
"filename": "{tweet_id}.json"
}
]
}
(see also https://github.com/mikf/gallery-dl/issues/1569#issuecomment-846428927)
Why does this need an extra option?
Because it would cause a lot of needless processing and path generation for data that gets discarded most of the time.
Text-only seems like a bit of a misnomer, it would imply that it wouldn't get media from tweets that have them. Perhaps "non-media" would be a better term? Or maybe "text-tweets"?
I don't mean to be that guy, but if/when this gets implemented for other websites, it'd make more sense for it to be named include-medialess
I would like this to be implemented on 4chan and similar sites. And I believe that several other people also have the same need on different sites, so maybe it would be good to add an option to save the metadata of posts without media regardless of the site?
How can I download the metadata without there being media in the tweets? This is the kind of data I'm talking about: