Open ForxBase opened 3 months ago
Been running into the same thing myself- lost three accounts over the last week and change.
I'm planning to let it rest a week or so before trying to make any more accounts, on the suspicion that my IP had been flagged for enhanced scrutiny. I figure if may be a temporary thing and if I back off for a little I might be okay to try again with a more cautious set of values for sleep
and sleep-request
.
I also suspect that time of use may be a factor, so I was planning to only schedule my script to only the extractor during local daytime hours, in case the scraper running overnight was tripping some kind of suspicious-activity alert.
Post your configuration. Switch IP address. Don't use timeline
(it's broken anyway) if you're using it.
Maybe you can elaborate on what you mean here about not using timeline
and your rationale? I see you have your own open issue regarding infinite sleep-request behavior that mentions it, but that doesn't really seem relevant to the problem of accounts being banned by Twitter.
And maybe you could also prove some more helpful advice on how to actually get an ISP to issue a new IP? As far as I'm aware most of them use DHCP, so a release+renew at the gateway will just pull the same address, and I have no interest in leaving my modem disconnected for long enough that the lease expires.
At any rate, at command line, I'm just using gallery-dl https://x.com/<account>
on first-pass, and gallery-dl -o skip=abort:3 https://x.com/<account>
on subsequent runs. Config, in relevant part (removed other extractor details):
{
"extractor": {
"archive": "<database>,
"base-directory": <directory>,
"path-extended": true,
"user-agent": "browser",
"retries":-1,
"twitter":{
"archive": <twitter_db>,
"cookies": <twitter_cookie_file>,
"filename": "{date:%Y%m%d_%H%M%S}-{tweet_id}-img{num:>02}.{extension}",
"sleep": [5, 7.5],
"sleep-request": [30, 35]
}
},
"downloader": {
"mtime": false
},
"output": {
"shorten": "eaw"
}
}
timeline
is retrieving tweets from random users. abort
doesn't work because it ignores those tweets. The process never ends because I waited days for the process of the input URL to end. When timeline
did work, it was what always caused Twitter to impose rate limiting regardless of what my sleep
and sleep-request
timings were. Twitter bans you if your IP address is on their list while getting rate limited.
timeline
is retrieving tweets from random users.abort
doesn't work because it ignores those tweets. The process never ends because I waited days for the process of the input URL to end. Whentimeline
did work, it was what always caused Twitter to impose rate limiting regardless of what mysleep
andsleep-request
timings were. Twitter bans you if your IP address is on their list while getting rate limited.
I just made a new account over another IP and used that IP to download from one single user. My account got suspended almost immediately after the download finished! I don't know what to do.
same here, my account got suspended, twitter must have been noticing this
same here, my account got suspended, twitter must have been noticing this
twitter is unusable now and I can't make a new account for each user download...
I've had an account suspended as well. Not sure what my sleep values were, unfortunately, as I increased them since then.
A related issue: #5775
I've had an account suspended as well. Not sure what my sleep values were, unfortunately, as I increased them since then.
A related issue: #5775
Anyone who doesn't get suspended? Is there nothing I can do?
I've had an account suspended as well. Not sure what my sleep values were, unfortunately, as I increased them since then. A related issue: #5775
Anyone who doesn't get suspended? Is there nothing I can do?
I just kept appealing but I think they are ignoring me now so Cheers musk
I've had an account suspended as well. Not sure what my sleep values were, unfortunately, as I increased them since then. A related issue: #5775
Anyone who doesn't get suspended? Is there nothing I can do?
I just kept appealing but I think they are ignoring me now so Cheers musk
does it work for you now?
Hoping that a fix for this comes out soon. Have not been able to back up in a while and artists from Brazil have started to delete their accounts, or have already. Any help would be appreciated. Account got locked but not banned. sleep at 10-38 and sleep-request at 15-55.
That can't be it, or at least not the whole picture, because the first account I used with GDL, and consequently the first one I lost, was my daily-driver personal account.
I haven't used my account in weeks now and it's still been downloading 24/7 successfully. Even when I "used" my account I was just browsing art or looking for accounts to download without tweeting or retweeting.
I haven't used my account in weeks now and it's still been downloading 24/7 successfully. Even when I "used" my account I was just browsing art or looking for accounts to download without tweeting or and retweeting.
What region/country are you located in? Do you have 2fa activated, with phone or something? Trying to figure out the link between all of these inconsistent recommendations.
I'm in the west coast USA. I don't have 2FA enabled. I have a phone number on the account but it's a Google voice number which means it's a VOIP number which I assume is the type of number the bot people use. I have a dynamic IP and have never used a VPN or proxy with this specific account. Before Elon Musks takeover I filled a 14TB drive with the drive speed being the bottleneck which means I was going extremely fast. I say that because that surely should have caused some red flags on their end. Unfortunately almost none of those accounts were relevant so I switched to whitelisting which accounts I download. I've never used Twitter to tweet, retweet or like ever. I've used it to DM people and browse art. The account is 5 years old. It's also a developer account but I doubt that matters.
I said the following in a related issue (https://github.com/mikf/gallery-dl/issues/5775)
Anyone who doesn't get suspended?
I download Twitter 24/7 with a low sleep setting. Before I made the time between each request random (using a range) I had to have a lot higher sleep between each request. This is using my home IP with my actual twitter account I use. After I lowered the sleep and made it a range I've recieved no rate limiting at all. Before the change I'd get told to wait until a certain time before continuing. This is a little confusing to me because I'm probably making a lot more requests per day than Elon allocates to free accounts. I'm not home so I can't tell you the settings yet. I actually copied them from somewhere else in gallery-dl's issues.
-o "sleep=[1.5,5]" -o "sleep-request=[6.0,12.0]"
I'm in the west coast USA. I don't have 2FA enabled. I have a phone number on the account but it's a Google voice number which means it's a VOIP number which I assume is the type of number the bot people use. I have a dynamic IP and have never used a VPN or proxy with this specific account. Before Elon Musks takeover I filled a 14TB drive with the drive speed being the bottleneck which means I was going extremely fast. I say that because that surely should have caused some red flags on their end. Unfortunately almost none of those accounts were relevant so I switched to whitelisting which accounts I download. I've never used Twitter to tweet, retweet or like ever. I've used it to DM people and browse art. The account is 5 years old. It's also a developer account but I doubt that matters.
I said the following in a related issue (#5775)
Anyone who doesn't get suspended?
I download Twitter 24/7 with a low sleep setting. Before I made the time between each request random (using a range) I had to have a lot higher sleep between each request. This is using my home IP with my actual twitter account I use. After I lowered the sleep and made it a range I've recieved no rate limiting at all. Before the change I'd get told to wait until a certain time before continuing. This is a little confusing to me because I'm probably making a lot more requests per day than Elon allocates to free accounts. I'm not home so I can't tell you the settings yet. I actually copied them from somewhere else in gallery-dl's issues.
-o "sleep=[1.5,5]" -o "sleep-request=[6.0,12.0]"
Im guessing that that is as a result of having a legacy twitter developer account... ugh. Might have to give applying for one a shot, but they watch what you do and I dont even know what I would say in the 250 character application.
I applied for the API thing and immediately got access to the developer section, so apparently it isn't a "wait for approval" type application. Going to give it a try sometime later, but I still have hard API limits.
I suspect that twitter does not delete accounts if it seems like a human user uses it regularly. What I suspect is going on is that
gallery-dl
activity is getting flagged as a possible bot activity and needs further investigation. Then if there is no evidence of typical user activity, such as tweeting, retweeting, etc. it deletes account. However if there is evidence of typical user activity, it makes the user do thearkrose
challenge. At least, that is what I noticed with my gallery-dl use on my accounts.I might be wrong, and I do not know for sure. Personally, I have not lost an account yet, but the accounts that I use
gallery-dl
on are also accounts that I use weekly. Though these days, I always get aarkrose
challenge after agallery-dl
run. I also do not know if there are any actions or activity that would get a account deleted that agallery-dl
user might also be doing. So what I am saying could be a detrimental task. Try it on your account at your own risk.
I think your account will get banned soon as well. Afterall, we all do the arkose challenge before getting banned. Now my new account can't even login using cookie option
Wow, they really really want you to use a browser to scrape with? Seems kind of backwards.
So if you use Selenium, what happens? I'm wondering if they're using a JS version of a canary warrant. There's code you can run on the server that expects certain responses (i.e. anti-adblocker methods), for example. Not sure the "order of headers" is too useful now, since browsers all seem to be moving to randomizing the order. Do people get these same bans if they use a userscript method?
I'm actually curious what they're doing that triggers the bans. Of course, they don't want you to know. ;) Watch it be something stupidly simple because they only have to get it right, once.
i have been scraping from twitter for near a year, and only lost two accounts for reasons unrelated to g-dl e: also both cookies come from accounts which i actively use, the cookie i typically dl with comes from my more heavily used account
my config:
{
"extractor": {
"base-directory": "X:/My Drive/!pr0n/",
"archive": "%appdata%/gallery-dl/archive.sqlite3",
"path-restrict": "^A-Za-z0-9_.~!-",
"skip": "abort:3",
"keywords-default": "",
"twitter": {
"archive": "X:/My Drive/zzTwitter/archive.twitter.sqlite3",
"parent-directory": "true",
"skip": "abort:3",
"#cookies": "X:/My Drive/zzTwitter/cookies.twitter.1.txt",
"cookies": "X:/My Drive/zzTwitter/cookies.twitter.2.txt",
"sleep": [24.9, 45.2],
"sleep-request": [23.8, 52.6],
"image-filter": "author is user",
"logout": true,
"syndication": true,
"text-tweets": true,
"include": ["avatar","background","media","timeline"],
"directory": {
"count ==0":["zzTwitter","downloads","{author[id]}.{author[name]}","text_tweets"],
"": ["zzTwitter","downloads","{author[id]}.{author[name]}","media"]
},
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}",
"avatar": {
"directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","avatar"],
"archive": "",
"filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}"
},
"background": {
"directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","background"],
"archive": "",
"filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}"
},
"metadata": true,
"postprocessors":[{
"name": "metadata",
"event": "post",
"directory": "metadata",
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json"
}]
}
}
}
i have been scraping from twitter for near a year, and only lost two accounts for reasons unrelated to g-dl e: also both cookies come from accounts which i actively use, the cookie i typically dl with comes from my more heavily used account
my config:
{ "extractor": { "base-directory": "X:/My Drive/!pr0n/", "archive": "%appdata%/gallery-dl/archive.sqlite3", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "", "twitter": { "archive": "X:/My Drive/zzTwitter/archive.twitter.sqlite3", "parent-directory": "true", "skip": "abort:3", "#cookies": "X:/My Drive/zzTwitter/cookies.twitter.1.txt", "cookies": "X:/My Drive/zzTwitter/cookies.twitter.2.txt", "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], "image-filter": "author is user", "logout": true, "syndication": true, "text-tweets": true, "include": ["avatar","background","media","timeline"], "directory": { "count ==0":["zzTwitter","downloads","{author[id]}.{author[name]}","text_tweets"], "": ["zzTwitter","downloads","{author[id]}.{author[name]}","media"] }, "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}", "avatar": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","avatar"], "archive": "", "filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}" }, "background": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","background"], "archive": "", "filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}" }, "metadata": true, "postprocessors":[{ "name": "metadata", "event": "post", "directory": "metadata", "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json" }] } } }
I tried this from your config but didn't work it doesn't log in "authentication required"
{ "extractor": { "base-directory": "D:\gallery-dl", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "",
"twitter": {
"parent-directory": "true",
"skip": "abort:3",
"cookies-from-browser": "firefox",
"sleep": [24.9, 45.2],
"sleep-request": [23.8, 52.6],
"image-filter": "author is user",
"logout": true,
"syndication": true,
"text-tweets": true,
"include": ["avatar","background","media","timeline"],
"directory": {
"count ==0":["Twitter","downloads","{author[id]}.{author[name]}","text_tweets"],
"": ["Twitter","downloads","{author[id]}.{author[name]}","media"]
},
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}",
"avatar": {
"directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","avatar"],
"filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}"
},
"background": {
"directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","background"],
"filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}"
},
"metadata": true,
"postprocessors": [{
"name": "metadata",
"event": "post",
"directory": "metadata",
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json"
}]
}
}
}
i have been scraping from twitter for near a year, and only lost two accounts for reasons unrelated to g-dl e: also both cookies come from accounts which i actively use, the cookie i typically dl with comes from my more heavily used account
my config:
{ "extractor": { "base-directory": "X:/My Drive/!pr0n/", "archive": "%appdata%/gallery-dl/archive.sqlite3", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "", "twitter": { "archive": "X:/My Drive/zzTwitter/archive.twitter.sqlite3", "parent-directory": "true", "skip": "abort:3", "#cookies": "X:/My Drive/zzTwitter/cookies.twitter.1.txt", "cookies": "X:/My Drive/zzTwitter/cookies.twitter.2.txt", "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], "image-filter": "author is user", "logout": true, "syndication": true, "text-tweets": true, "include": ["avatar","background","media","timeline"], "directory": { "count ==0":["zzTwitter","downloads","{author[id]}.{author[name]}","text_tweets"], "": ["zzTwitter","downloads","{author[id]}.{author[name]}","media"] }, "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}", "avatar": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","avatar"], "archive": "", "filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}" }, "background": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","background"], "archive": "", "filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}" }, "metadata": true, "postprocessors":[{ "name": "metadata", "event": "post", "directory": "metadata", "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json" }] } } }
but with this value you use
"sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6],
it takes forever to download one entire user profile!
I tried this from your config but didn't work it doesn't log in "authentication required"
{ "extractor": { "base-directory": "D:\gallery-dl", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "",
"twitter": { "parent-directory": "true", "skip": "abort:3", "cookies-from-browser": "firefox", "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], "image-filter": "author is user", "logout": true, "syndication": true, "text-tweets": true, "include": ["avatar","background","media","timeline"], "directory": { "count ==0":["Twitter","downloads","{author[id]}.{author[name]}","text_tweets"], "": ["Twitter","downloads","{author[id]}.{author[name]}","media"] }, "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}", "avatar": { "directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","avatar"], "filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}" }, "background": { "directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","background"], "filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}" }, "metadata": true, "postprocessors": [{ "name": "metadata", "event": "post", "directory": "metadata", "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json" }] } }
}
are you using a cookies file? twitter requires logging in to view most profiles nowadays create a cookie file and point to that.
I use an chrome extension Open Cookies.txt
, install that and then log into twitter on your desktop browser. click the extension and if it requests permission to read your data on Twitter say Grant Access
then choose the Raw Cookies.txt
option and highlight everything in the resulting text block and copy-paste it into a file
but with this value you use
"sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6],
it takes forever to download one entire user profile!
yes it can take some time; i haven't played much with the sleep times but they can probably go lower without risk to the account being banned
it takes me about 3 days to re-download almost 500 profiles with the "skip": "abort:3
" option set (about 10min per profile), after not running it for a while so there was more media than normal for me to grab.
run my same inputs again with lower sleep
/sleep-request
values
also, my config for twitter will download text tweets too, and makes a json file for all tweets. if you don't want that you can easily edit them out of the config
also, my config for twitter will download text tweets too, and makes a json file for all tweets. if you don't want that you can easily edit them out of the config
I havent really used cookies before, am I doing this right? my browser is operagx, I can switch if needed I have other browsers installed
code pasted into .conf file with twittercookie.txt being a copy paste of what I got from rawcookies.txt
},
"twitter": {
"parent-directory": "true",
"skip": "abort:3",
"cookies": "C:\Users\UserProfile\AppData\Roaming\gallery-dl\twittercookie.txt",
"sleep": [24.9, 45.2],
"sleep-request": [23.8, 52.6],
"image-filter": "author is user",
"logout": true,
"syndication": true,
"text-tweets": false,
"include": ["avatar","background","media","timeline"],
"directory": {
"count ==0":["Twitter","downloads","{author[id]}.{author[name]}","text_tweets"],
"": ["Twitter","downloads","{author[id]}.{author[name]}","media"]
},
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}",
"avatar": {
"directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","avatar"],
"filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}"
},
"background": {
"directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","background"],
"filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}"
},
"metadata": true,
"postprocessors": [{
"name": "metadata",
"event": "post",
"directory": "metadata",
"filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json"
}]
}
}
I then do a basic download such as py -3 -m gallery_dl -D C:\Users\downloadtempname\ https://x.com/tempname
And I get a response of [twitter][info] Requesting guest token [twitter][error] AuthorizationError: Login required
can you run it again but add -v at the end of the command then paste the verbose output
can you run it again but add -v at the end of the command then paste the verbose output
C:\Users\>py -3 -m gallery_dl -D C:\Users\downloadtempname\ https://x.com/tempname -v
[gallery-dl][debug] Version 1.27.1
[gallery-dl][debug] Python 3.12.2 - Windows-11-10.0.22631-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 2.2.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://x.com/tempname'
[twitter][debug] Using TwitterUserExtractor for 'https://x.com/tempname'
[twitter][debug] Using TwitterTimelineExtractor for 'https://x.com/tempname/timeline'
[twitter][info] Requesting guest token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.x.com:443
[urllib3.connectionpool][debug] https://api.x.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 200 63
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): x.com:443
[urllib3.connectionpool][debug] https://x.com:443 "GET /i/api/graphql/k5XapwcSikNsEsILW5FvgA/UserByScreenName?variables=**Listed information like what is set to true or false in the conf** HTTP/1.1" 200 1040
[urllib3.connectionpool][debug] https://x.com:443 "GET /i/api/graphql/tO4LMUYAZbR4T0SqQ85aAw/UserMedia?variables=%7 **Listed information like what is set to true or false in the conf** HTTP/1.1" 404 0
[twitter][debug] API error: 'Unspecified'
[twitter][error] AuthorizationError: Login required
[gallery-dl][debug] Configuration Files []
You config file is not getting loaded. Make sure it is at one of the locations listed here or by gallery-dl --config-status
.
[gallery-dl][debug] Configuration Files []
You config file is not getting loaded. Make sure it is at one of the locations listed here or by
gallery-dl --config-status
.
C:\Users\Userprofile>gallery-dl --config-status
[config][error] JSONDecodeError when loading 'C:\Users\Userprofile\gallery-dl\config.json': Invalid \escape: line 70 column 23 (char 2234)
C:\Users\Userprofile\AppData\Roaming\gallery-dl\config.json : Not Present
C:\Users\Userprofile\gallery-dl\config.json : Invalid JSON
C:\Users\Userprofile\gallery-dl.conf : Not Present
C:\Users\Userprofile>py -3 gallery-dl --config-status
C:\Users\Userprofile\AppData\Local\Programs\Python\Python312\python.exe: can't find '__main__' module in 'C:\\Users\\Userprofile\\gallery-dl'
[config][error] JSONDecodeError when loading 'C:\Users\Userprofile\gallery-dl\config.json': Invalid \escape: line 70 column 23 (char 2234)
"cookies": "C:\Users\JamesD\AppData\Roaming\gallery-dl\twittercookie.txt",
You can't use single backslashes for filesystem paths in a JSON file. You need to either double them \\
or replace them with forward slashes /
.
"cookies": "C:\\Users\\JamesD\\AppData\\Roaming\\gallery-dl\\twittercookie.txt",
"cookies": "C:/Users/JamesD/AppData/Roaming/gallery-dl/twittercookie.txt",
C:\Users\UserProfile>gallery-dl --config-status
[config][error] JSONDecodeError when loading 'C:\Users\UserProfile\gallery-dl\config.json': Expecting ',' delimiter: line 103 column 2 (char 3653)
C:\Users\UserProfile\AppData\Roaming\gallery-dl\config.json : Not Present
C:\Users\UserProfile\gallery-dl\config.json : Invalid JSON
C:\Users\UserProfile\gallery-dl.conf : Not Present
Your config file is still not valid JSON, therefore it is not loaded/used at all. Use a site like https://www.jslint.com/ for example to fix your JSON if your editor can't do that.
Your config file is still not valid JSON, therefore it is not loaded/used at all. Use a site like https://www.jslint.com/ for example to fix your JSON if your editor can't do that.
Thanks for the site, I had the json ending with
}]
}
fixed when I changed it to
}]
}
}
}
I tried this from your config but didn't work it doesn't log in "authentication required" { "extractor": { "base-directory": "D:\gallery-dl", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "",
"twitter": { "parent-directory": "true", "skip": "abort:3", "cookies-from-browser": "firefox", "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], "image-filter": "author is user", "logout": true, "syndication": true, "text-tweets": true, "include": ["avatar","background","media","timeline"], "directory": { "count ==0":["Twitter","downloads","{author[id]}.{author[name]}","text_tweets"], "": ["Twitter","downloads","{author[id]}.{author[name]}","media"] }, "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}", "avatar": { "directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","avatar"], "filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}" }, "background": { "directory": ["Twitter","downloads","{author[id]}.{author[name]}","media","background"], "filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}" }, "metadata": true, "postprocessors": [{ "name": "metadata", "event": "post", "directory": "metadata", "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json" }] } }
}
are you using a cookies file? twitter requires logging in to view most profiles nowadays create a cookie file and point to that.
I use an chrome extension
Open Cookies.txt
, install that and then log into twitter on your desktop browser. click the extension and if it requests permission to read your data on Twitter say Grant Access then choose theRaw Cookies.txt
option and highlight everything in the resulting text block and copy-paste it into a filebut with this value you use "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], it takes forever to download one entire user profile!
yes it can take some time; i haven't played much with the sleep times but they can probably go lower without risk to the account being banned it takes me about 3 days to re-download almost 500 profiles with the
"skip": "abort:3
" option set (about 10min per profile), after not running it for a while so there was more media than normal for me to grab. run my same inputs again with lowersleep
/sleep-request
values
Thanks. With these values
"sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6],
it takes you three days to download 500 profiles? How? I lowered them, didn't get banned yet.
Thanks. With these values
"sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6],
it takes you three days to download 500 profiles? How? I lowered them, didn't get banned yet.
i updated to these values:
"sleep": [12.9, 31.2],
"sleep-request": [11.8, 35.6],
and got through 400 profiles in about 16hr i simply am being overly cautious of the twitter timeouts when i first started with sleep times of something like 5s i would frequently be forced to prove i'm human . i'm actually surprised it never resulted in a ban considering how often it happened
and got through 400 profiles in about 16hr i simply am being overly cautious of the twitter timeouts when i first started with sleep times of something like 5s i would frequently be forced to prove i'm human . i'm actually surprised it never resulted in a ban considering how often it happened
Same used to get prove your human alot now never, Its weird. the account I use to download is even suspended and it just doesnt care
i have been scraping from twitter for near a year, and only lost two accounts for reasons unrelated to g-dl e: also both cookies come from accounts which i actively use, the cookie i typically dl with comes from my more heavily used account
my config:
{ "extractor": { "base-directory": "X:/My Drive/!pr0n/", "archive": "%appdata%/gallery-dl/archive.sqlite3", "path-restrict": "^A-Za-z0-9_.~!-", "skip": "abort:3", "keywords-default": "", "twitter": { "archive": "X:/My Drive/zzTwitter/archive.twitter.sqlite3", "parent-directory": "true", "skip": "abort:3", "#cookies": "X:/My Drive/zzTwitter/cookies.twitter.1.txt", "cookies": "X:/My Drive/zzTwitter/cookies.twitter.2.txt", "sleep": [24.9, 45.2], "sleep-request": [23.8, 52.6], "image-filter": "author is user", "logout": true, "syndication": true, "text-tweets": true, "include": ["avatar","background","media","timeline"], "directory": { "count ==0":["zzTwitter","downloads","{author[id]}.{author[name]}","text_tweets"], "": ["zzTwitter","downloads","{author[id]}.{author[name]}","media"] }, "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}-{num}.{author[name]}_~{content[0:69]}~_~{filename}.{extension}", "avatar": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","avatar"], "archive": "", "filename": "{date:%Y-%m-%d_%H-%M-%S}_avatar_{author[id]}.{author[name]}~_~{filename}.{extension}" }, "background": { "directory": ["zzTwitter","downloads","{author[id]}.{author[name]}","media","background"], "archive": "", "filename": "background_{date:%Y-%m-%d_%H-%M-%S}~_~{filename}.{extension}" }, "metadata": true, "postprocessors":[{ "name": "metadata", "event": "post", "directory": "metadata", "filename": "{date:%Y-%m-%d_%H-%M-%S}~_~{tweet_id}.{author[name]}~_~{content[0:69]}.json" }] } } }
I stole a great deal of this and it works pretty damn well. One question though, what is syndication? Cant find it in the configuration docs here: https://gdl-org.github.io/docs/configuration.html
what is syndication?
It was a workaround to download age-restricted content without login, back when you could use Twitter as guest user. 1171911dc3c8c739f8eac1e16a42bfd53cce6ac7, 92ff99c8e55910ecb0c91d7cac67c76a336324dd
i'm using main twitter account, it dont even getting suspended for many years, but the one thing is rate limit lol. if you using clone account, etc.. it's highly getting suspended
two accounts got suspended in a matter of days and downloading a few user profile's media. is this problem solvable? anyone else with this problem?