Closed AlttiRi closed 1 year ago
Wait, it's not retweets.
It's media (and descriptions) of other profiles from the target profile replies to.
I want to download only content of the passed profile, without any content of other profiles.
This is why I said to set the default value for replies
option to self
. Why my words were ignored...
To be precise, it must be set to timeline/replies extractor only I guess
Anyway, replies: "self"
in your config
I just passed https://twitter.com/profile
link.
https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractortwitterreplies Seems it's just a bug.
I want to download in this case all profile's media posts.
And with -o text-tweets=true
all profile's posts (media and non-media).
Without any other profile's content.
Something like "related": ["retweets", "replies"]
([]
/false
— for none) — one key that will define which related profile's content will be downloaded would be more convenient for configuration I think.
It's not a bug. This is how gallery-dl operates when you pass normal profile links. (https://github.com/mikf/gallery-dl/commit/915dba8345d3d457a80f08fb34d0409b00829444, https://github.com/mikf/gallery-dl/commit/0add1fc0908ae460da173e305bd8659632f6807b)
In your case when you set text-tweets=true
gallery-dl uses replies timeline.
Actually, I just tested and it definitely doesn't work right. For example: https://twitter.com/amanatsu_mikan7/with_replies (NSFW)
[gallery-dl][warning] logfile: missing or invalid path (expected str, bytes or os.PathLike object, not NoneType)
[gallery-dl][debug] Version 1.22.2 - Executable
[gallery-dl][debug] Python 3.7.9 - Windows-8.1-6.3.9600
[gallery-dl][debug] requests 2.28.0 - urllib3 1.26.9
[gallery-dl][warning] unsupportedfile: [Errno 2] No such file or directory: 'C:\\Users\\Madobe\\Desktop\\f\\logs\\unsupported.txt'
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/amanatsu_mikan7/with_replies'
[twitter][debug] Using TwitterRepliesExtractor for 'https://twitter.com/amanatsu_mikan7/with_replies'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName?variables=%7B%22screen_name%22%3A%22amanatsu_mikan7%22%2C%22withSafetyModeUserFields%22%3Atrue%2C%22withSup
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/t4wEKVulW4Mbv1P0kgxTEw/UserTweetsAndReplies?variables=%7B%22userId%22%3A%222402630918%22%2C%22count%22%3A100%2C%22withCommunity%22%3Atrue%2C%22incl
[twitter][debug] Using download archive './archive/twitter_db.sqlite3'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWV5Ct8VsAAsXCo?format=jpg&name=orig HTTP/1.1" 200 79345
* .\galleries\twitter\tantou_KAI (2839069854)\[22-06-28] 1541771537174646784_p1.jpg
[twitter][debug] Skipping 1541773999071318016 (reply)
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWU_0g7akAAJ5vr?format=jpg&name=orig HTTP/1.1" 200 113915
* .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541708638997598208_p1.jpg
[twitter][debug] Skipping 1541709905710632960 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541708638997598208_p1.jpg
[twitter][debug] Skipping 1541715055061807104 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541708638997598208_p1.jpg
[twitter][debug] Skipping 1541714920911147009 (reply)
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWVGQUuaMAAvq-m?format=jpg&name=orig HTTP/1.1" 200 260149
* .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541715698661355520_p1.jpg
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541708638997598208_p1.jpg
[twitter][debug] Skipping 1541714418962399234 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-28] 1541708638997598208_p1.jpg
[twitter][debug] Skipping 1541708825022976000 (reply)
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWRN1t9aQAAhX6-?format=jpg&name=orig HTTP/1.1" 200 177351
* .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541705469923717128 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541531280139231232 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541520782773211137 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541485179201417216 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541463159822700544 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541458604124798977 (reply)
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWTRtO1UUAEDD_j?format=jpg&name=orig HTTP/1.1" 200 196468
* .\galleries\twitter\Agovitch1 (1353631856877477889)\[22-06-28] 1541591372226248704_p1.jpg
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541455347461681153 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541449072736841734 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541449028252372993 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541449017174794240 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541448759649120256 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541448545836101632 (reply)
# .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541442617300635648_p1.jpg
[twitter][debug] Skipping 1541446637016723456 (reply)
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWRS51CaMAE3JZk?format=jpg&name=orig HTTP/1.1" 200 237222
* .\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541448131329822720_p1.jpg
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/FWRS51JakAEM3XE?format=jpg&name=orig HTTP/1.1" 200 369312
.\galleries\twitter\amanatsu_mikan7 (2402630918)\[22-06-27] 1541448131329822720_p2.jpg
KeyboardInterrupt
replies: "self"
in the config. Probably because it has quoted tweet.i've been having similar issues since the update to 1.22.2 and i've found out another weird thing,
https://twitter.com/Anon2000000/status/1538118265335062528
this tweet is skipped if i login (eg. with -u and -p) but its handled fine if im not logged in
maybe try running gallery-dl with "replies":"self"
and without cookies to see how it behaves
for example running this: gallery-dl https://twitter.com/Anon2000000 --write-metadata
ends with only one folder
but running this: gallery-dl https://twitter.com/Anon2000000 -u xxxxxx -p xxxxxxxx --write-metadata
ends with 213 folders, all of which are images that the user replied to
both used the same config:
"twitter":
{
"pinned": true,
"quoted": true,
"videos": true,
"cards": false,
"conversations": false,
"replies": "self",
"retweets": true,
"twitpic": false,
"syndication": true,
"archive-format": "{tweet_id}_{num}",
"image": {
"archive-format": "image{filename}"
}
}
it also seems similar to the problem in #2713 since all of these problems run back to getting data from user adjecent to the one specified with the link
https://twitter.com/Anon2000000/status/1538118265335062528 this tweet is skipped if i login (eg. with -u and -p) but its handled fine if im not logged in
This tweet would also be skipped on previous versions (although you should confirm it yourself, with enabled retweets in config) Sorry, it won't since the tweet displays in the normal timeline, but that does not negate my point. I explained why it might happen here. While you're logged off twitter does not combine tweets into threads, so the tweet isn't skipped.
for example running this: gallery-dl https://twitter.com/Anon2000000 --write-metadata ends with only one folder
Same thing. Logged off = non-target user tweets are not included in the replies. Details.
None of these issues are related to the current version, they've been here for a long time.
good to know, i still don't know why it gets post from other users while logged in though, even if it's treating the replies as retweets or quote tweets it's still handling like individual post, so you get all the folders with them, that is the main problem at hand
None of these issues are related to the current version, they've been here for a long time.
Are you sure about that?
I mean, what's the implication here, that "replies": "self"
had this issue even before the changes from #2665 ?
Strange, because I've never experienced it here. "replies": "self"
wouldn't result in "unexpected" replies (i.e. replies made actually by a different user), the only old issue was long threads/conversations, because they would be truncated by Twitter and thus could miss some replies here, as I understood it..
I mean, what's the implication here, that "replies": "self" had this issue even before the changes from https://github.com/mikf/gallery-dl/issues/2665 ?
The issue with "self"
before the change was that it would not download target user's tweet if it was a reply to other user. The issue with unexpected replies was occuring when replies: true
The issue with
"self"
before the change was that it would not download target user's tweet if it was a reply to other user.
Yes, but only for long conversations, i.e. long enough to get "truncated" by Twitter, as you've reported in another issue, if I'm not mistaken, right?
The issue with
"self"
before the change was that it would not download target user's tweet if it was a reply to other user. The issue with unexpected replies was occuring whenreplies: true
actually "replies": "self"
it's leading to unexpected replies now in 1.22.2, replies: true
worked fine before but it missed stuff instead, it didn't add unrelated tweets which the user replied to
and specifically, it leads to unexpected tweets only when logged in, it doesn't if there isn't a login which is also odd
actually "replies": "self" it's leading to unexpected replies now in 1.22.2
Yeah, it seems you're right. So it wasn't fixed then.
Yes, but only for long conversations, i.e. long enough
No, any conversations. Also it affects search, for example, and probably even /media
timeline. Basically:
tweet from user1 = {replied to: user1} - downloaded
tweet from user1 = {replied to: user2} - skipped
tweet from user1 = {replied to: user1, user2} - downloaded
Something like that.
it fixed something, because even while logged out i'm gettiing more images than before (like the ones in threads it was missing before) but for some reason i goes way off the handle if you're logged in and gets some tweets it shouldn't
The "unexpected tweets only when logged in" are unrelated to "replies": "self"
and happen because Twitter expands conversions when logged in, as nisehime explained further up. Those tweets are not a reply, so replies
doesn't even trigger for them.
I'm very much considering reverting 0add1fc0 and not using user_tweets_and_replies
for user urls, since that seems to be the root cause of many problems.
Also, what about an option that basically does the same as --filter "author['id'] == <user id>"
, so it filters out any tweets not from the actual user?
i forgot about filters lol but yeah it seems that an option would be optimal since it can be headache having 300 unrelated files all of a sudden
The problem is that https://twitter.com/Anon2000000/status/1541508842269425667 the tweet which is Anon2000000 replying to is not a reply itself so gallery-dl's replies behavior is not applied here.
Also, what about an option that basically does the same as --filter "author['id'] ==
", so it filters out any tweets not from the actual user?
This is what I asked long time ago. But not as an option, rather it should be that target user's metadata would always be accessible in the keyword dictionary. So the people can write these filter's by themselves. That would also fix issues like these: https://github.com/mikf/gallery-dl/issues/2713
I'm very much considering reverting https://github.com/mikf/gallery-dl/commit/0add1fc0908ae460da173e305bd8659632f6807b and not using user_tweets_and_replies for user urls, since that seems to be the root cause of many problems.
I think that's not a good idea. I suppose people expect from gallery-dl to gather all the tweets from the timeline, and not using replies timeline could miss quite a lot, considering that an average user is probably not even aware that using twitter.com/user
and twitter.com/user/with_replies
would make a difference
using
twitter.com/user
andtwitter.com/user/with_replies
would make a difference
Since https://github.com/mikf/gallery-dl/commit/0add1fc0908ae460da173e305bd8659632f6807b, gallery-dl uses twitter.com/user/with_replies
(+ search) for twitter.com/user
as input URL when (retweets
or text-tweets
) and replies
are enabled, and that seems to cause a lot of problems.
not using replies timeline could miss quite a lot
You sure about that? I always thought that at least /media
had everything up to a certain point in time, including media posted as reply.
/media
does seems to do a good job getting tweets like this even when logged in so i guess that could be a work around https://twitter.com/Anon2000000/status/1538118265335062528
You sure about that? I always thought that at least /media had everything up to a certain point in time, including media posted as reply.
Yes, as long as it's only /media it should be fine, but when people set retweets: true
or text-tweets: true
, so gallery-dl induced to use normal timeline, it would miss tweets. You can of course leave it to users to deal with it, since I'm probably the only person here who noticed all that problems with missing tweets in normal timelines, but some day someone would notice that too maybe, and the circle will close.
I also think solutions like using /media+/with_replies+/tweets timelines are ugly and not really practical. Especially when you have a quite large (500+) list of links to download. It creates unnecessary traffic and makes the whole process longer, even with all the skips of already downloaded content. Also, it can trigger twitter's time-out thing, though I'm not sure about that.
Gallery-dl's keyword dictionary for twitter always contains author
and user
objects for tweet metadata, however the only case when those object are different are retweets. For the rest of the time they're identical, correct me if I'm wrong. So that's a field for improvement. Say, user
object can always contain target user's metadata. Not sure how it should behave with subcategories wich don't have target user like search, though.
Also, it won't make much sense without proposed thread expanding. I've seen you did an expand
option and it seems to be working, but something needs to be done about repeated file download attempts. The only idea I have here is to temporarily keep their IDs in RAM.
Regarding this issue's original topic: I've put out another release that no longer uses the /with_replies
endpoint and reverts to the same behavior as v1.22.1.
The changes from https://github.com/mikf/gallery-dl/commit/0add1fc0908ae460da173e305bd8659632f6807b should be re-applied at some point, but it was premature to do so in v1.22.2.
You sure about that? I always thought that at least /media had everything up to a certain point in time, including media posted as reply.
Yes, as long as it's only /media it should be fine, but when people set
retweets: true
ortext-tweets: true
, so gallery-dl induced to use normal timeline, it would miss tweets. You can of course leave it to users to deal with it, since I'm probably the only person here who noticed all that problems with missing tweets in normal timelines, but some day someone would notice that too maybe, and the circle will close.I also think solutions like using /media+/with_replies+/tweets timelines are ugly and not really practical. Especially when you have a quite large (500+) list of links to download. It creates unnecessary traffic and makes the whole process longer, even with all the skips of already downloaded content. Also, it can trigger twitter's time-out thing, though I'm not sure about that.
So should I do a pass with retweets/text tweets disabled and then a pass with them enabled?
So should I do a pass with retweets/text tweets disabled and then a pass with them enabled?
If you only need user's media content then leave both disabled or use twitter.com/user/media
link (but gallery-dl won't automatically perform search for older tweets with the link).
If you want retweets too and don't want to miss media content, then do it with two passes.
If you want all text-tweets too... Well, that's complicated, the changes are reverted now, so you should use twitter.com/user/with_replies
link and be prepared to face the problems caused by it, and keep in mind that some text-tweets are still going to be missed. Just like with /media
, auto search for older tweets won't be performed. Alternatively or additionally, you can also manually apply search query for all user's tweets.
So should I do a pass with retweets/text tweets disabled and then a pass with them enabled?
If you only need user's media content then leave both disabled or use
twitter.com/user/media
link (but gallery-dl won't automatically perform search for older tweets with the link).If you want retweets too and don't want to miss media content, then do it with two passes.
If you want all text-tweets too... Well, that's complicated, the changes are reverted now, so you should use
twitter.com/user/with_replies
link and be prepared to face the problems caused by it, and keep in mind that some text-tweets are still going to be missed. Just like with/media
, auto search for older tweets won't be performed. Alternatively or additionally, you can also manually apply search query for all user's tweets.
Is there any way to disable the new search functionality? I already had that handled as part of my script and now it's probably going to lead to a lot of API requests I don't want it to do.
Use /tweets link, like twitter.com/user/tweets. /media and /with_replies doesn't do search as I said.
I'm lazy to read the messages above, that I would like:
"cards": false
)-o text-tweets=true
)of the passed profile only, by just passing a direct profile link.
Downloading of third-party retweets and commented third-party tweets is enabled with an extra config option.
If it will work such way from the box I think it would be convenient and intuitively.
I just decided to downgrade to gallery-dl 1.21.2-1 for the time being until this can be solved or how to make the folders better structured.
@mikf All the strategy options include search, but I specifically want to avoid having it search under any circumstance. It's a lot of API calls and I handle searching elsewhere already.
nisehime already explained that in https://github.com/mikf/gallery-dl/issues/2712#issuecomment-1169374407
If you don't want a search, use twitter.com/USER/tweets
, twitter.com/USER/media
, and twitter.com/USER/with_replies
.
The search only happens for direct user URLs (twitter.com/USER
) since version 1.22.0 (915dba83) in the hopes of improving the default behavior for users that don't have any extra scripts.
strategy
only applies for those direct URLs to give a bit more control and avoid issues like this one here.
nisehime already explained that in #2712 (comment)
If you don't want a search, use
twitter.com/USER/tweets
,twitter.com/USER/media
, andtwitter.com/USER/with_replies
.The search only happens for direct user URLs (
twitter.com/USER
) since version 1.22.0 (915dba8) in the hopes of improving the default behavior for users that don't have any extra scripts.
strategy
only applies for those direct URLs to give a bit more control and avoid issues like this one here.
I see, I misunderstood what the new documentation was saying.
Does that mean the workaround for getting all the tweets I described here https://github.com/mikf/gallery-dl/issues/2712#issuecomment-1169247338 is still necessary?
If so, would it be possible for me to somehow retrieve the json of tweets for both with_replies and media, and then feed the list of urls back into gallery-dl? The new 'unique' setting seems like it would make it so it didn't have to check the same tweet twice.
@mikf Until such an option to disable search is implemented, I figure I should modify the code for a personal copy, but looking at the code, no easy way to do it really sticks out to me. What should I do?
The bug have appeared after the update.
I optionally use
-o text-tweets=true
only when I need to save no media tweet's text (in fact to download everything of the passed profile), since usually it is required only for some profiles.The conf:
Now when I use
ggat
it downloads the ~retweets~ reply target posts, that is undesirable.gga
works as expected (as earlier).