Open Dante-tLDS opened 1 year ago
Would it be possible for gallery-dl to detect what media is intended to be a gif file and convert it into such instead of an mp4?
Note really, no. Any gif uploaded to Twitter gets automatically converted to mp4 on their servers and is only downloadable as such. (#2691)
Alongside this, would it also be possible to download the text associated with a tweet as a .txt file, and within said file include the tweet metadata? (Example of .txt file contents in attached file)
You need a config file and use a metadata
post processor. Possibly a template file as well.
And finally, for file organizing, how possible is it to set tweet names to something like Author_TweetID?
Use the filename option with the available metadata fields, which you can find with gallery-dl -K <twitter url>
Things like MediaImage
could be done with conditional filename format strings.
Additionally, my reason for starting the Filename with the Author would be for ease of organizing as well,
Wouldn't it be better to put all media of one user in a separate directory then?
Additionally, is it at all possible to apply tags to a tweets filename?
I don't think Twitter provides metadata for that.
{
"extractor": {
"twitter": {
"filename": "{author['name']}-{tweet_id}-{num}.{extension}",
"directory": ["{user['name']}"],
"postprocessors": [
{
"name": "metadata",
"format": "\fTF path/to/template",
"event": "post",
"filename": "{author['name']}-{tweet_id}-0Text.txt"
}
]
}
}
}
Seems like this probably belongs into Discussions and not into Issues, but okay..
Would it be possible for gallery-dl to detect what media is intended to be a gif file and convert it into such instead of an mp4?
Well, guessing user intentions is generally not the strong suit of any computer program.. ๐
That said, not sure if there are actually any real GIF files on Twitter? Aren't they all MP4 now anyways? Whatever it is, this can be done by setting up an exec post-processor..
Alongside this, would it also be possible to download the text associated with a tweet as a .txt file, and within said file include the tweet metadata? (Example of .txt file contents in attached file) Example.txt
Sure thing, simply set up a metadata post-processor in your config for Twitter, which would look a little something like this:
{
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": "{retweet_id|tweet_id}:{content}",
"filename": "Twitter__{author[name]}__{date:%Y.%m.%d}__{retweet_id|tweet_id}.txt",
"directory": "Tweetcontent",
}
The important part here is "mode": "custom"
which allows you to set up "content-format":
with whatever the hell you want.
At least use {content}
for the actual tweet content.
And maybe also use the archive
options for this post-processor as well..
(Documentation of options for the metadata post-processor begin here)
And finally, for file organizing, how possible is it to set tweet names to something like Author_TweetID? Here's an example name I came up with awhile back: (File Names): Doe-1234567890-0Text.txt Doe-1234567890-1MediaImage.png/jpeg Doe-1234567890-2MediaMov.mp4/webm/etc Doe-1234567890-3MediaGif.gif
Very easy, but you should probably familiarize yourself with the "filename"
setting .
What you describe is probably just a simple setting for "filename"
of {author[name]}_{tweet_id}{count:?_//>04}.{extension}
But if I were you, would at least add something like {date:%Y-%m-%d}
as well.
Additionally, is it at all possible to apply tags to a tweets filename? IE whether it's NSFW, Safe, or inbetween (I would call this SNFW for (Safe, but Not For Work, open for ideas). These names being exactly as such so as not to get (N)SFW results for typing SFW instead of Safe. I doubt tagging media details would be possible since even on sites that support this, it is applied manually by the user/admins, but if it were possible it would be neat
Can't tell you off the top of my head if Twitter provides such metadata, but it probably provides more than enough..
Anyway, here is your most important gallery-dl advice:
Simply run gallery-dl <Your Twitter* URL here> -K
to see the available metadata you can straightforwardly use.
*: Not just for Twitter, all supported sites by gallery-dl
Damn, beat me by a minute ๐
{ "postprocessors": [ { "name": "metadata", "format": "\fTF path/to/template", "event": "post", "filename": "{author['name']}-{tweet_id}-0Text.txt" } ] }
metadata.format
as an alternative to "mode": "custom"
+ "content-format": .....
๐ค
Is this new? I don't see it in configuration.rst
?
metadata.format
as an alternative to"mode": "custom"
+"content-format": .....
๐ค
format
was the content-format
before 26d23345 and it never got removed
and I added an implicit "mode": "custom"
in b57015cf because I got tired of having to write it over and over.
Is this new? I don't see it in
configuration.rst
?
Writing documentation and keeping it up to date is not something I'm particularly good at ...
metadata.format
as an alternative to"mode": "custom"
+"content-format": .....
๐ค
format
was thecontent-format
before 26d2334 and it never got removed and I added an implicit"mode": "custom"
in b57015c because I got tired of having to to write it over and over.
Ah, so I was actually using the newer/suggested/canonical variant, that's good to know. ๐
Is this new? I don't see it in
configuration.rst
?Writing documentation and keeping it up to date is not something I'm particularly good at ...
Don't worry about that in my opinion, the documentation is actually really good! It's a documentation wasteland out there for so many projects, and gallery-dl is holding up very very well here.
Thank you all for your feedback. I appreciate the time. Forgive my confusion, I am soon to learn Python, but I pasted the code you two made (editing the PostProcessor part with what Hrxn mentioned since if I read right it was the more optimal method of doing this) into my /git/gallery-dl/gallery-dl/extractor/twitter.py file, but after saving it I didn't notice a change in how files were being named or .txt files being saved. Here is what I added to my file:
{
"extractor": {
"twitter": {
"filename": "{author['name']}-{tweet_id}-{num}.{extension}",
"directory": ["{user['name']}"],
"postprocessors": [
{
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": "{retweet_id|tweet_id}:{content}",
"filename": "Twitter__{author[name]}__{date:%Y.%m.%d}__{retweet_id|tweet_id}.txt",
"directory": "Tweetcontent",
}
]
}
}
}
I am completely expecting it to be me simply going about this incorrectly, and I apologize for taking more time in asking how to do this correctly, but I'll attach the modified twitter.py file I have just to help see where I messed up. (Uploading as .txt) twitter.txt The change is only at the end of the file
(Edit I noticed a config.py under /git/gallery-dl/gallery-dl/ that showed what seemed to be a place to paste code for extractors and postprocessors, and I attempted to add the lines to there as well in the idea that perhaps I was intended to add them there. This is the file now, I added the code under Public Interface, although no luck with this either. config.txt
Ah, I guess there has been a slight misunderstanding, my bad.
You certainly do not need to edit any python files, at all.
Simply make a config file (either by hand or by running gallery-dl --config-create
), and put the stuff into this one.
Take a look at the Configuration section of the main readme, it's basically just two short paragraphs.
There are also two example .conf
config files in the /docs part of the repo, if you need further examples.
I also just remembered the wiki page, giving a quick rundown and explaining the necessary things even for newcomers: https://github.com/mikf/gallery-dl/wiki/config-file-outline
[..] but I pasted the code you two made (editing the PostProcessor part with what Hrxn mentioned since if I read right it was the more optimal method of doing this) [..]
Well, not really more "optimal" or better. They both work the same way, they are functionally the same
Ah, that helps. Thank you for clarifying. This is what my config.json looks like:
{
"extractor": {
"twitter": {
"filename": "{author['name']}-{tweet_id}-{num}.{extension}",
"directory": ["{user['name']}"],
}
},
"downloader": {
},
"output": {
},
"postprocessor": {
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": "{retweet_id|tweet_id}:{content}",
"filename": "Twitter__{author[name]}__{date:%Y.%m.%d}__{retweet_id|tweet_id}.txt",
"directory": "Tweetcontent",
}
}
I got some of it to work, but am having a bit of difficulty. Pasting what I had as is gives me this error:
[config][error] JSONDecodeError when loading '/home/user/.config/gallery-dl/config.json': Expecting property name enclosed in double quotes: line 6 column 5 (char 176)
, and running it through a JSON Validator gives me this error:
...{user['name']}"], } }, "downloader":
----------------------^
Expecting 'STRING', got '}'
Removing the commas at the end of "directory": ["{user['name']}"],
and "directory": "Tweetcontent",
renders it a valid JSON format, but when in use I notice that it isn't quite right, the names are changed as intended, but it's not saving any .txt files associated with the tweets.
Okay, so pretty good news on this, I got it to work pretty much as desired (although not using the format I intended that shows up in an empty config file), but yeah I got names mostly working and text files downloading. Here is my config:
{
"extractor": {
"twitter": {
"filename": "{author['name']}-{tweet_id}-{num}Media{extension}.{extension}",
"directory": ["{user['name']}"],
"username": "Twitter032",
"password": "Twixer(These are not real)",
"postprocessors": [{
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": ["https://twitter.com/{author[name]}/status/{retweet_id|tweet_id}",
"Author: @{author[name]}",
"Posted: {date:%Y/%m/%d_%I:%M%p}",
"",
"{content}"],
"filename": "{author[name]}-{retweet_id|tweet_id}-0Text.txt",
"directory": null
}]
}
}
}
I did intentionally set Directory to null to get txt files in the same dir as the media, which I'm happy about
My question now is, is there a way to make it so that, if an extension ends in jpg, png etc, the name of the file is changed to a general name?
IE; If Extension = jpg/png, name = Photo.{extension}
, or If = mp4/webm, name = Mov
etc?
And is there a way for me to include the URL that was downloaded in the txt file? Having the tweet ID is good, though I would like to set it to the full link to the tweet if possible.
I may have a way, "content-format": "https://twitter.com/{author[name])/status/{retweet_id|tweet_id}"
Although I do not know if this is reliable, it seems to work, but will it be always accurate this way?
Similar to that, what can be done in a text file by configs alone? Would it be tricky to get it to do something that looks like this:
# Example twitter txt file
[Tweet Link]
[Author Screenname@Username]
[Date]
[Likes]-[Retweets]-[Replies]-[Views]-[Device Uploaded From]
[Tweet Content]
[Separately displayed Tweet Hashtags]
I know that the Date can be given, but I notice that with the content as it is in my configs, it groups the hashtags along with the tweet message. Is this possible to have on a separate line?
I apologize if I am asking a lot of questions, I am really just happy that a tool that can do this exists at all and want to use it to its best abilities. Thank you both for your continued time in helping me understand how this works
{ "extractor": { "twitter": { "filename": "{author['name']}-{tweet_id}-{num}.{extension}", "directory": ["{user['name']}"], } }, "downloader": { }, "output": { }, "postprocessor": { "name": "metadata", "event": "post", "mode": "custom", "content-format": "{retweet_id|tweet_id}:{content}", "filename": "Twitter__{author[name]}__{date:%Y.%m.%d}__{retweet_id|tweet_id}.txt", "directory": "Tweetcontent", } }
I got some of it to work, but am having a bit of difficulty. Pasting what I had as is gives me this error:
[config][error] JSONDecodeError when loading '/home/user/.config/gallery-dl/config.json': Expecting property name enclosed in double quotes: line 6 column 5 (char 176)
, and running it through a JSON Validator gives me this error:...{user['name']}"], } }, "downloader": ----------------------^ Expecting 'STRING', got '}'
Removing the commas at the end of
"directory": ["{user['name']}"],
and"directory": "Tweetcontent",
renders it a valid JSON format, but when in use I notice that it isn't quite right, the names are changed as intended, but it's not saving any .txt files associated with the tweets.
Yup, exactly. You have trailing commas in two places in the snippet above.
Trailing comma means that JSON expects a new "value": ..
or a new {<object>}
, but never a closing }
or ]
Also, if you don't use any settings (which is fine, the defaults are designed to be reasonable) for "downloader"
or "output"
, you can simply delete these blocks.
The problem with your config example here, while it is correct in principle (besides that you have to give a name to a post-processor set up in that place), is that you create a post-processor for metadata, but you are not using any post-processor when running gallery-dl with a Twitter URL..
Okay, so pretty good news on this, I got it to work pretty much as desired (although not using the format I intended that shows up in an empty config file), but yeah I got names mostly working and text files downloading. Here is my config:
```json { "extractor": { "twitter": { "filename": "{author['name']}-{tweet_id}-{num}Media{extension}.{extension}", "directory": ["{user['name']}"], "username": "Twitter032", "password": "Twixer(These are not real)", "postprocessors": [{ "name": "metadata", "event": "post", "mode": "custom", "content-format": ["https://twitter.com/{author[name]}/status/{retweet_id|tweet_id}", "Author: @{author[name]}", "Posted: {date:%Y/%m/%d_%I:%M%p}", "", "{content}"], "filename": "{author[name]}-{retweet_id|tweet_id}-0Text.txt", "directory": null }] } } } ```
Okay, you already found it. ๐
You have moved the post-processor to the correct place for this..
I did intentionally set Directory to null to get txt files in the same dir as the media, which I'm happy about My question now is, is there a way to make it so that, if an extension ends in jpg, png etc, the name of the file is changed to a general name? IE;
If Extension = jpg/png, name = Photo.{extension}
, orIf = mp4/webm, name = Mov
etc?
Yes! You can set "filename"
("directory"
as well) to an object which means that its values get used conditionally.
Like this, for example:
"filename": {
"extension == 'png'": "{tweet_id}-{num}-Photo.{extension}",
"extension == 'mp4'": "{tweet_id}-{num}-Video.{extension}",
"" : "{author['name']}-{tweet_id}-{num}Media{extension}.{extension}"
}
The last line is the default condition, it's what will be used if the checks in the two lines before don't evaluate to true.
Although be careful to not use an overly generic filename, i.e. never just like "Photo.{extension}"
. Because if the file already exists in your filesystem, gallery-dl might think it already has them and will skip lots of downloads ๐
~And is there a way for me to include the URL that was downloaded in the txt file? Having the tweet ID is good, though I would like to set it to the full link to the tweet if possible.~ I may have a way,
"content-format": "https://twitter.com/{author[name])/status/{retweet_id|tweet_id}"
Although I do not know if this is reliable, it seems to work, but will it be always accurate this way?
Dunno, don't know the entirety of the Twitter metadata by heart.. {permalink}
, maybe?
Similar to that, what can be done in a text file by configs alone? Would it be tricky to get it to do something that looks like this:
# Example twitter txt file [Tweet Link] [Author Screenname@Username] [Date] [Likes]-[Retweets]-[Replies]-[Views]-[Device Uploaded From] [Tweet Content] [Separately displayed Tweet Hashtags]
I know that the Date can be given, but I notice that with the content as it is in my configs, it groups the hashtags along with the tweet message. Is this possible to have on a separate line?
Not tricky at all, take a look at Special Type Format String on how to set up a template file.
A normal template should be enough, e.g. \fT ~/.gallery-dl/templates/twitter.txt
And then set up this template file like you want, e.g
Tweet Link: "https://twitter.com/{author[name])/status/{retweet_id|tweet_id}"
Author: {author[name]}
Date: {date}
Content: {content}
...
and so on..
Sorry for taking a bit to respond, here is my progress:
{
"extractor": {
"twitter": {
"filename": {
"extension == 'jpg','png'": "{author['name']}-{tweet_id}-{num}MediaPhoto.{extension}",
"extension == 'mp4','webm'": "{author['name']}-{tweet_id}-{num}MediaVideo.{extension}",
"" : "{author['name']}-{tweet_id}-{num}Media{extension}.{extension}"
},
"directory": ["{user['name']}"],
"username": "",
"password": "",
"postprocessors": [{
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": ["https://twitter.com/{author[name]}/status/{retweet_id|tweet_id}",
"Author: {author['name']}@{author[name]}",
"Posted: {date:%Y/%m/%d_%I:%M%p}",
"",
"{content}"],
"filename": "{author[name]}-{retweet_id|tweet_id}-0Text.txt",
"directory": null
}]
}
}
}
Regarding the Filename, I am delighted to say that, as of so far, it works precisely as desired. Thank you for this. I think the only thing stopping me from getting any farther with naming is the lack of available metadata on twitters side for stuff like tagging things as sfw/nsfw, and applying tags about the image itself automatically. I am really glad I am able to get this far here. (I did have to add a comma at the end of the } in your example though to get it to work, but that was thankfully a super easy fix. Maybe python isn't as hard to learn after all)
I did attempt to include "{permalink}",
under content-format, but sadly it just outputted none
.
Do you happen to have the code that counts for screen names? I haven't been able to find it. Tried looking it up but I have had no luck. (Screen names being like, [Screen name]@handle_name, basically the nickname for twitter).
I tried looking at the Special Type Formatting String you mentioned, but I am not really sure I understand. What I do understand is that you recommend I use the T type Format String (\fT ~/.templates/twitter.txt
as a modified version of the given example in the page), though I admit I do not follow. Is a template file a separate file that I create in ~/.gallery-dl/templates/
and then add my desired content-format strings to it, and from there instead of using a "content-format":
string in my config file, I just use "format": "\fT ~/.gallery-dl/templates/twitter.txt"
?
Regarding the Filename, I am delighted to say that, as of so far, it works precisely as desired. Thank you for this. I think the only thing stopping me from getting any farther with naming is the lack of available metadata on twitters side for stuff like tagging things as sfw/nsfw, and applying tags about the image itself automatically. I am really glad I am able to get this far here. (I did have to add a comma at the end of the } in your example though to get it to work, but that was thankfully a super easy fix. Maybe python isn't as hard to learn after all)
And JSON is even easier... ๐
I did attempt to include
"{permalink}",
under content-format, but sadly it just outputtednone
.Do you happen to have the code that counts for screen names? I haven't been able to find it. Tried looking it up but I have had no luck. (Screen names being like, [Screen name]@handle_name, basically the nickname for twitter).
Check with gallery-dl -K <twitter_url>
I tried looking at the Special Type Formatting String you mentioned, but I am not really sure I understand. What I do understand is that you recommend I use the T type Format String (
\fT ~/.templates/twitter.txt
as a modified version of the given example in the page), though I admit I do not follow. Is a template file a separate file that I create in~/.gallery-dl/templates/
and then add my desired content-format strings to it, and from there instead of using a"content-format":
string in my config file, I just use"format": "\fT ~/.gallery-dl/templates/twitter.txt"
?
Yes, it's exactly that. "content-format"
then points to the template file, and the metadata fields can be used in that template file.
Very excitingly I did find it via -K - {author[nick]}
is the arg for this =D
Also I noticed something in the filename thingy, using the same format as above, for some reason or another, extensions that end in mp4 are still being given the "Photo" tag, despite the config clarifying to render mp4's as "Video". Hmmm It seems that removing the Webm part, it works as intended. That's interesting
(I haven't tested the template yet, just wanted to update on the nickname bit)
Ah, I see what you mean. Yeah, the conditional expression is Python syntax.
And "extesion == <value>"
works only on single values.
For multiple values, use "extension in ('mp4', 'webm')"
Ah, that worked. I never realized this chat could update live =0
Guess I should try the Special Type String now. Though, there is no existing templates directory anywhere presently. Would I put this in the git/gallary-dl folder, or in config/gallery-dl? - or.. wait if I'm pointing to it via /fT, does it actually matter where I put it?
I'm being silly aren't I. It's the template folder in /home, isn't it. Though the example page showcases a hidden directory (
~/.templates
, not ~/templates
). Ah, I'm a bit confused lolI am taking that example way too literally aren't I
Yep got it working lol. Put it in /home/twitter.txt
https://twitter.com/{author[name]}/status/{retweet_id|tweet_id}
Author: {author[nick]} @{author[name]}
Posted: {date:%Y/%m/%d %I:%M%p}
{content}
and in config.json:
"format": "\fT ~/twitter.txt",
This is, genuinely, much easier to understand than it first looked for me
Here's my latest update to my txt contents:
https://twitter.com/{author[name]}/status/{retweet_id|tweet_id}
Author: {author[nick]} @{author[name]}
Posted: {date}
{reply_count} replies, {retweet_count} retweets, {quote_count} quotes, {favorite_count} favorites
{content}
I added some additional metadata and edited the date to just {date}
, which shows y/m/d and then 24h time down to seconds. Very neat
Only thing I didn't see during my -K was the views in a tweet, nor the device used to send the tweet, which was a tad disappointing but it's fine. I wonder also what I can use for getting timezones in the mix, just for date clarity If I have any more inquiries, is here fine to post in?
Hey, how possible is it to have multiple files created? Like say ontop of the content file, how about a User file as well? One that goes like this:
"postprocessors": [{
"name": "metadata",
"event": "post",
"mode": "custom",
"content-format": [
"https://twitter.com/{author[name]} (ID: {author[id]})",
"Profile Banner: {author[profile_banner]}",
"Profile Picture: {author[profile_image]}",
"User: {author[nick]} @{author[name]}",
"Joined: {author[date]}",
"Located: {author[location]}",
"Description:",
"{author[description]}",
"",
"URL: {author[url]}",
"Verified: {author[verified]}",
"{author[followers_count]} followers, {author[friends_count]} friends",
"{author[statuses_count]} Statuses, {author[media_count]} media, {author[listed_count]} lists, {author[favorites_count]} favorites"
],
"filename": {
"If [Last-Modified] is Same as before": "null",
"If [Last-Modified] is different": "{author[name]}-{author[id]}-[Date Downloaded]-[Last-Modified]-Info.txt",
"" : "{author[name]}-{author[id]}-[Date Downloaded]-Info.txt"
},
"directory": "{author[name]}-Profile"
}]
The idea I had for the FileNames was to determine the last time the data has been modified, and if unchanged during a gallery-dl command, just to ignore making the file (I assume null would work), and if different to then create the file. Idk what it'd look like to actually check for this though. Am I using [Last-Modified] right? Rather, would that be what I use to check the latest modification of metadata and conditionally determine if the file gets saved on a new gallery-dl command or not? (Also I tried looking, what is the code for scanning the current date?)
All this while also downloading the Profile Banner/Image in the mentioned {author[name]}-Profile dir in the same fashion as the text file?
Would it be possible for gallery-dl to detect what media is intended to be a gif file and convert it into such instead of an mp4?
Alongside this, would it also be possible to download the text associated with a tweet as a .txt file, and within said file include the tweet metadata? (Example of .txt file contents in attached file) Example.txt
And finally, for file organizing, how possible is it to set tweet names to something like Author_TweetID? Here's an example name I came up with awhile back: (File Names): Doe-1234567890-0Text.txt Doe-1234567890-1MediaImage.png/jpeg Doe-1234567890-2MediaMov.mp4/webm/etc Doe-1234567890-3MediaGif.gif
My reason for adding Media before the type is for ease of searching, if you wanted to search for any kind of media vs just a specific type, you could just search for "Media", and if you wanted only a specific type, can instead search for that. Additionally, my reason for starting the Filename with the Author would be for ease of organizing as well, since if you sort it alphanumerically, you will be able to group the tweets by author. And since it's followed by the tweet ID, each tweet will be chronologically organized per author.
Additionally, is it at all possible to apply tags to a tweets filename? IE whether it's NSFW, Safe, or inbetween (I would call this SNFW for (Safe, but Not For Work, open for ideas). These names being exactly as such so as not to get (N)SFW results for typing SFW instead of Safe. I doubt tagging media details would be possible since even on sites that support this, it is applied manually by the user/admins, but if it were possible it would be neat