mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.91k stars 976 forks source link

Deviantart Journals keep repeating infinitely #6079

Closed FISSI0N closed 2 months ago

FISSI0N commented 2 months ago

I've recently noticed that when trying to download Journals/Status Messages/Posts by a deviantart user, gallery-dl somehow gets confused and keeps repeating those posts infinitely. Regular posts (including text posts) appear to keep working just fine.

here is an example, initially everything works fine:

C:\Users\*redacted*>gallery-dl --verbose https://www.deviantart.com/t-s-k-tg/
[gallery-dl][debug] Version 1.27.3
[gallery-dl][debug] Python 3.12.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['%USERPROFILE%\\gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://www.deviantart.com/t-s-k-tg/'
[deviantart][debug] Using DeviantartUserExtractor for 'https://www.deviantart.com/t-s-k-tg/'
[deviantart][debug] Sleeping 1.00 seconds (extractor)
[deviantart][debug] Using DeviantartGalleryExtractor for 'https://www.deviantart.com/t-s-k-tg/gallery'
[deviantart][debug] Using custom API credentials (client-id *redacted*)
[deviantart][debug] Sleeping 1.00 seconds (extractor)
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com:443
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/t-s-k-tg HTTP/1.1" 200 348
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=T-S-K-TG&offset=0&limit=50&mature_content=true HTTP/1.1" 200 899
[deviantart][debug] Using DeviantartFolderExtractor for 'https://www.deviantart.com/T-S-K-TG/gallery/237B1135-C641-54C9-8354-C3E6137073B9/Featured'
[deviantart][debug] Using custom API credentials (client-id *redacted*)
[deviantart][debug] Sleeping 1.00 seconds (extractor)
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/237B1135-C641-54C9-8354-C3E6137073B9?username=T-S-K-TG&offset=0&limit=24&mature_content=true&mode=newest HTTP/1.1" 200 3807
[deviantart][debug] Switching to private access token
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/237B1135-C641-54C9-8354-C3E6137073B9?username=T-S-K-TG&offset=0&limit=24&mature_content=true&mode=newest HTTP/1.1" 200 3824
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/metadata?deviationids%5B0%5D=0A81C2CC-B854-EE91-B09C-7FB9F0538E2B&deviationids%5B1%5D=DE4EDAF9-C100-C265-E11A-DCB0F83204E2&deviationids%5B2%5D=C6F5D965-3E51-91DA-C4A5-91C83AD10C79&deviationids%5B3%5D=EC2AC926-9887-8B30-F5DE-A76680A74F44&deviationids%5B4%5D=3FD3B8AA-8270-11D5-5CDF-2D56601784A1&deviationids%5B5%5D=5A629DD2-01AA-E60C-F399-8A91057D23BE&deviationids%5B6%5D=C7A0BD68-0D51-652F-ADC2-E9C58ED57441&deviationids%5B7%5D=1FA92D6F-F617-0B69-A561-1AB628521B5F&mature_content=true HTTP/1.1" 200 3067
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/0A81C2CC-B854-EE91-B09C-7FB9F0538E2B?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 334
[deviantart][debug] Using download archive 'C:\Users\*redacted*/gallery-dl/archive_deviantart.sqlite3'
[deviantart][debug] Active postprocessor modules: [MetadataPP]
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=0A81C2CC-B854-EE91-B09C-7FB9F0538E2B HTTP/1.1" 200 4270
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_The Contest_196095365.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=DE4EDAF9-C100-C265-E11A-DCB0F83204E2 HTTP/1.1" 200 None
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Thirteen Hours_196093690.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/C6F5D965-3E51-91DA-C4A5-91C83AD10C79?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 469
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=C6F5D965-3E51-91DA-C4A5-91C83AD10C79 HTTP/1.1" 200 6693
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Sniper Town_196092078.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/EC2AC926-9887-8B30-F5DE-A76680A74F44?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 473
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=EC2AC926-9887-8B30-F5DE-A76680A74F44 HTTP/1.1" 200 3698
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_STALKER_ Reasons to Survive_196090622.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=3FD3B8AA-8270-11D5-5CDF-2D56601784A1 HTTP/1.1" 200 None
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Rust Chapter One_ Memories_183388221.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=5A629DD2-01AA-E60C-F399-8A91057D23BE HTTP/1.1" 200 4532
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_No Man Left Behind_183374628.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/C7A0BD68-0D51-652F-ADC2-E9C58ED57441?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 661
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=C7A0BD68-0D51-652F-ADC2-E9C58ED57441 HTTP/1.1" 200 5106
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Little Angel_183373630.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/1FA92D6F-F617-0B69-A561-1AB628521B5F?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 414
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=1FA92D6F-F617-0B69-A561-1AB628521B5F HTTP/1.1" 200 None
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Helljumper_183372150.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/237B1135-C641-54C9-8354-C3E6137073B9?username=T-S-K-TG&offset=8&limit=24&mature_content=true&mode=newest HTTP/1.1" 200 70
[deviantart][debug] Using DeviantartScrapsExtractor for 'https://www.deviantart.com/t-s-k-tg/gallery/scraps'
[deviantart][debug] Using custom API credentials (client-id *redacted*)
[deviantart][debug] Sleeping 1.00 seconds (extractor)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET / HTTP/1.1" 200 None
[deviantart][debug] Sleeping 2.00 seconds (request)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_puppy/dashared/gallection/contents?username=T-S-K-TG&type=gallery&offset=0&limit=24&scraps_folder=true&csrf_token=D2MR6Iy66omHhf7G.sinytj.znAPKxK8STP1ep9xDCzUypwUXnz987DyCS5UX1-X0o4 HTTP/1.1" 200 68
[deviantart][info] No results for https://www.deviantart.com/t-s-k-tg/gallery/scraps

but then when journal posts are being extracted they keep repeating (last line of previous code block and first line of this block are purposefully copied twice):

[deviantart][info] No results for https://www.deviantart.com/t-s-k-tg/gallery/scraps
[deviantart][debug] Using DeviantartJournalExtractor for 'https://www.deviantart.com/t-s-k-tg/posts'
[deviantart][debug] Using custom API credentials (client-id *redacted*)
[deviantart][debug] Sleeping 1.00 seconds (extractor)
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/posts?username=T-S-K-TG&limit=50&mature_content=true HTTP/1.1" 200 2002
[deviantart][debug] Switching to private access token
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/posts?username=T-S-K-TG&limit=50&mature_content=true HTTP/1.1" 200 2014
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/metadata?deviationids%5B0%5D=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0&deviationids%5B1%5D=E8C3D76A-6191-DAD4-F9CF-734A7E1543BC&deviationids%5B2%5D=BDC2666E-2743-C41D-FD89-E22756D8A67F&deviationids%5B3%5D=700AA71A-7445-01BA-1F59-63EBB42B880C&deviationids%5B4%5D=59CC8A54-E41F-67F6-8DFF-A6DE1B46CD2E&mature_content=true HTTP/1.1" 200 516
[deviantart][debug] Using download archive 'C:\Users\*redacted*/gallery-dl/archive_deviantart.sqlite3'
[deviantart][debug] Active postprocessor modules: [MetadataPP]
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0 HTTP/1.1" 200 645
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_So, University._219626601.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/E8C3D76A-6191-DAD4-F9CF-734A7E1543BC?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 378
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=E8C3D76A-6191-DAD4-F9CF-734A7E1543BC HTTP/1.1" 200 362
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Gonna be a wait, Warforged guy_220880133.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/BDC2666E-2743-C41D-FD89-E22756D8A67F?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 493
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=BDC2666E-2743-C41D-FD89-E22756D8A67F HTTP/1.1" 200 323
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_New Stuff Soon_221241144.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=700AA71A-7445-01BA-1F59-63EBB42B880C HTTP/1.1" 200 452
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Injury and Detox_223085267.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=59CC8A54-E41F-67F6-8DFF-A6DE1B46CD2E HTTP/1.1" 200 1112
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_First Uploads_223488762.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/posts?username=T-S-K-TG&limit=50&mature_content=true HTTP/1.1" 200 2014
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/metadata?deviationids%5B0%5D=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0&deviationids%5B1%5D=E8C3D76A-6191-DAD4-F9CF-734A7E1543BC&deviationids%5B2%5D=BDC2666E-2743-C41D-FD89-E22756D8A67F&deviationids%5B3%5D=700AA71A-7445-01BA-1F59-63EBB42B880C&deviationids%5B4%5D=59CC8A54-E41F-67F6-8DFF-A6DE1B46CD2E&mature_content=true HTTP/1.1" 200 516
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0 HTTP/1.1" 200 645
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_So, University._219626601.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/E8C3D76A-6191-DAD4-F9CF-734A7E1543BC?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 378
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=E8C3D76A-6191-DAD4-F9CF-734A7E1543BC HTTP/1.1" 200 362
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Gonna be a wait, Warforged guy_220880133.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/comments/deviation/BDC2666E-2743-C41D-FD89-E22756D8A67F?maxdepth=5&offset=0&limit=50&mature_content=true HTTP/1.1" 200 493
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=BDC2666E-2743-C41D-FD89-E22756D8A67F HTTP/1.1" 200 323
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_New Stuff Soon_221241144.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=700AA71A-7445-01BA-1F59-63EBB42B880C HTTP/1.1" 200 452
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_Injury and Detox_223085267.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=59CC8A54-E41F-67F6-8DFF-A6DE1B46CD2E HTTP/1.1" 200 1112
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_First Uploads_223488762.htm
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/posts?username=T-S-K-TG&limit=50&mature_content=true HTTP/1.1" 200 2014
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/metadata?deviationids%5B0%5D=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0&deviationids%5B1%5D=E8C3D76A-6191-DAD4-F9CF-734A7E1543BC&deviationids%5B2%5D=BDC2666E-2743-C41D-FD89-E22756D8A67F&deviationids%5B3%5D=700AA71A-7445-01BA-1F59-63EBB42B880C&deviationids%5B4%5D=59CC8A54-E41F-67F6-8DFF-A6DE1B46CD2E&mature_content=true HTTP/1.1" 200 516
[deviantart][debug] Sleeping 1.00 seconds (api)
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/content?deviationid=E9BC01F9-0700-DB8B-8C4F-920B38F4F3B0 HTTP/1.1" 200 645
# G:\*redacted*\deviantart\T-S-K-TG\T-S-K-TG_So, University._219626601.htm

Previously, my unchanged configuration caused no issues:

    "extractor":
    {
        "base-directory": "G:/Sicherung 2013_09_02/",
        "parent-directory": false,
        "postprocessors": null,
    "archive": "%USERPROFILE%/gallery-dl/archive_{category}.sqlite3",
    "archive-mode":"file",
        "cookies": null,
        "cookies-update": true,
        "proxy": null,
        "skip": true,

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0",
        "retries": 10,
        "timeout": 30.0,
        "verify": true,
        "fallback": true,

        "sleep": 1,
        "sleep-request": 1,
        "sleep-extractor": 1,
    "sleep-429": 30,

        "path-restrict": "auto",
        "path-replace": "_",
        "path-remove": "\\u0000-\\u001f\\u007f",
        "path-strip": "auto",
        "path-extended": true,

        "extension-map": {
            "jpeg": "jpg",
            "jpe" : "jpg",
            "jfif": "jpg",
            "jif" : "jpg",
            "jfi" : "jpg"
        },
*unnecessary extractors ommited*
        "deviantart":
        {
            "directory": ["{category}", "{author[username]}"],
            "client-id": "*redacted*",
            "client-secret": "*redacted*",
            "auto-watch": true,
            "auto-unwatch": true,
            "comments": true,
            "extra": true,
            "flat": false,
            "folders": false,
            "group": true,
            "include": "gallery, scraps, journal",
            "intermediary": true,
            "journals": "html",
            "mature": true,
            "metadata": true,
            "original": true,
            "pagination": "manual",
            "quality": 100,
            "refresh-token": "cache",
            "wait-min": 1,
            "filename": "{author[username]}_{title}_{index}.{extension}",
            "postprocessors": [
                {
                "name": "metadata",
                "mode": "custom",
                "filename": "{author[username]}_{title}_{index}.html",
                "extension": "html",
                "format": "<h1 style='display: inline'><a href='{url}'>{title}</a></h1> by <a href='https://www.deviantart.com/{username}'>{author[username]}</a><div><br></div><div class='content'>{description}</div><br><div><hr><div class='tags'>[\"{tags:J\", \"}\"]</div><hr></div><div>{date:%Y.%m.%d} {extension}</div><br>\n\n"
                }
            ]
        },

PS: might be associated to those API changes mentioned here: https://github.com/mikf/gallery-dl/issues/5916

mikf commented 2 months ago

This bug is caused by "pagination": "manual" when used with a cursor-based API endpoint, which the new journal/status one is. Fixed in https://github.com/mikf/gallery-dl/commit/3bffe7a8bd8cf4434cd34153c781341dbfcbc177.