using "self" for extractor.twitter.replies doesn't work.

mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites

GNU General Public License v2.0

11.78k stars 965 forks source link

using "self" for extractor.twitter.replies doesn't work. #3894

Open Andrew5459 opened 1 year ago

Andrew5459 commented 1 year ago

using "replies" : "self" no longer filters out replies to other users.

Is anyone else having this issue?

mikf commented 1 year ago

This was probably caused by the same bug as in #3922. If that's the case, it was fixed by commit https://github.com/mikf/gallery-dl/commit/480bc34e54edee4f7627d98237254d7f83fcca1b.

Andrew5459 commented 1 year ago

The issue doesn't seem to be fixed.

It still downloads images from tweets replying to other users.

UPDATE:

I've tried using "image-filter" to try and do the same thing but even that wont work.

using this [ "image-filter": "user.get('reply_to') == author['name'] or reply_id == 0" ] should work, but doesn't

and using this [ "image-filter": "'reply_to == author['name'] or reply_id == 0" ] works but throws an exception when trying do download tweets that aren't replies.

flaccidbagel commented 1 year ago

Somewhat late reply, but wanted to chime in that I found this to also be occurring. Trying to figure out a good method of taking care of it but so far not having the most luck.

Andrew5459 commented 1 year ago

I managed to find a work around a while ago, forgot to post it here.

"image-filter": "reply_id > 0 and reply_to == author['name'] or reply_id == 0"

Basically with non-reply tweets, the string "reply_to" doesn't exist, so the program will throw an exception when it tries to reference it. Since non-reply tweets always have a "reply_id" of 0, you can use "reply_id > 0" to cause the "and" operator to return false and skip trying to reference "reply_to".

I'm leaving the issue open since this is only a work around.

mikf commented 1 year ago

"image-filter": "reply_id > 0 and reply_to == author['name'] or reply_id == 0"

not reply_id or reply_to == author['name']

Could someone provide an example for where this does not work? All the tests for "replies": "self" still pass ...

This might also be due to a slight change in behavior in 749802c7. The internal code now tests for reply_to == user['name'] rather then reply_to == author['name']. (user is the user referenced in the input URL, author is the user who created the Tweet)

flaccidbagel commented 1 year ago

"image-filter": "reply_id > 0 and reply_to == author['name'] or reply_id == 0"

not reply_id or reply_to == author['name']

Could someone provide an example for where this does not work? All the tests for "replies": "self" still pass ...

This might also be due to a slight change in behavior in 749802c. The internal code now tests for reply_to == user['name'] rather then reply_to == author['name']. (user is the user referenced in the input URL, author is the user who created the Tweet)

Just from a basic test, this looks to work without issue. The "replies": "self" option looks like its currently non-functional, but as you've said that could be partially due to usernames/URL mismatches.

musjj commented 11 months ago

@mikf This is the tweet I'm testing on: https://twitter.com/DetFantasia/status/1721575724627218468 It's downloading tweets that wasn't posted by DetFantasia.

This is my configuration:

"twitter": {
  "cards": true,
  "conversations": true,
  "expand": true,
  "quoted": false,
  "replies": "self",
  "retweets": false,
  "strategy": "media",
  "syndication": true,
  "text-tweets": true,
  "twitpic": true,
  "videos": true
}

musjj commented 11 months ago

I understand where these tweets are coming from now, they're advertisements. They can be filtered out by using source != "advertiser-interface" with image-filter.

A built-in option (set to true by default) for this would be convenient, since I doubt there are many users out there that wants to download any advertisements.

I also looked through the repo and I'm surprised that no one has faced this issue yet, so I'm guessing that there are other factors affecting the visibility of ads.

EDIT: Found another one: Twitter for Advertisers.