mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.89k stars 976 forks source link

Weibo download incomplete #4168

Open kylieeeeeee opened 1 year ago

kylieeeeeee commented 1 year ago

There's no any error occurred, it just stopped at some point of time. e.g. First post from this account is at 2016/10/21, but it stopped downloading at 2017/05/17 without any notice.

...
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_2.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_3.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_4.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_5.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_6.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_7.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_8.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-14 10_36_38_9.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-13 13_33_14_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-12 08_33_02_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 11_33_07_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-09 10_33_13_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-06 09_33_15_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-05 05_36_44_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 15_47_00_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 12_33_14_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-03 12_33_14_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 15_58_21_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 12_17_24_9.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-02 11_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 12_33_07_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-06-01 06_33_08_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-30 10_34_03_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 11_33_17_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_1.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_2.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_3.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_4.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_5.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_6.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_7.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_8.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-26 09_33_21_9.mp4
# .\gallery-dl\weibo\吴磊工作室\2017-05-23 14_33_17_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-23 14_33_17_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 11_33_13_1.png
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_5.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_6.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_7.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_8.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-21 09_33_25_9.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-20 11_33_05_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_3.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 11_33_05_4.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-19 10_33_06_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_1.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_2.jpg
# .\gallery-dl\weibo\吴磊工作室\2017-05-17 07_33_28_3.jpg
PS C:\Users\User>
mikf commented 1 year ago

I only got till 2022-07-20 07:54:16 before it stopped with this account.

It appears that Weibo sometimes sends an empty response even though it shouldn't and gallery-dl currently interprets that as the end of the timeline. Retrying the same request a couple of times should help.

kylieeeeeee commented 1 year ago

Been trying for about ten times, didn't work at all

mikf commented 1 year ago

I wasn't talking about you retrying, but the gallery-dl code ...

Anyway, it seems that there is a hard limit with how far back weibo allows one to go, at least on the timeline for tabtype=feed.

Post 4108417012636584 from 2017/05/17 appears to be the last accessible one using this API endpoint. https://weibo.com/ajax/statuses/mymblog?uid=6019229199&feature=0&since_id=4108417012636584

You could try all the other tabtype=… timelines (home, video, album). Maybe some of them go further back than feed.


How did you get a link to this user's first post?

kylieeeeeee commented 1 year ago

It did work with trying different tabtypes, but still not able to trace back to their first post.

You can search posts of this account with filter by time on their home page, and there's a limit where you can scroll down, which is the time when this account was created.Screenshot_20230620_114948_Weibo.png

YuanGYao commented 8 months ago

Hello, I also had this problem when using gallery-dl to download image from weibo. I checked your code and it seems that you stopped the loop when the list returned by Weibo is empty. Is my understanding correct?

Update: I tested on the web page of Weibo and found that Weibo sometimes returns an empty list when there are still images that have not been loaded, but at the same time the since_id is still a normal value. Something like this:

{
  "data": {
    "since_id": "4839329857536226_4839738068701694|1034:4839735135502387_20221130_-1",
    "list": []
  },
  "bottom_tips_visible": false,
  "bottom_tips_text": "",
  "ok": 1
}

If you send the request again with the since_id, Weibo will still return the remaining data.

I tested other accounts, and when the Weibo image is actually loaded, the list it returns is not necessarily empty, but its since_id is 0. Image actually loaded:

{
  "data": {
    "since_id": 0,
    "list": [
      {
        "pid": "008rbDxXgy1h4nz35ottoj31900u0wq4",
        "mid": "4796635823476885",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:34ae58ca49fcd09a44397045013a39d9",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h4o043fk2pj30ku13in3h",
        "mid": "4796635823476885",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:2b8a854c07aae880e1dbca34d3928468",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3ydky1a9kj30wn0dwjwd",
        "mid": "4788655920776472",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:97587a8e4b8490382bcb1d0a9b53c6a1",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3nr6ejgzej31hc0u0tln",
        "mid": "4785303568780117",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:e5baff22a86e619c35bb138e7f548fce",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3i6yvb03cj30wn0dwdoq",
        "mid": "4784730057213232",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:9940fc72b75958fa243be500d480ab46",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3hyyjl2rcj30wn0dwdoq",
        "mid": "4783491886089040",
        "is_paid": false,
        "timeline_month": "06",
        "timeline_year": "2022",
        "object_id": "1042018:196d01f2a5af1c30a730dce1806c0b0a",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h3a4h0312wj30wn0dwdoq",
        "mid": "4781009785325408",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:a978607db0912e53155a9515a8e3637d",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2turdmfznj31hc0u0kj1",
        "mid": "4775883259250379",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:363552624947aa7a83d8f787464cfdf2",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2turg76ctj31hc0u0np2",
        "mid": "4775883259250379",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:a839fd918faa4c3564c3651b3b6355f2",
        "type": "pic"
      },
      {
        "pid": "008rbDxXgy1h2ktizbfmnj31hc0u0kjl",
        "mid": "4773078053162341",
        "is_paid": false,
        "timeline_month": "",
        "timeline_year": "",
        "object_id": "1042018:0c2aee6ac109547117ecd7a73f481d22",
        "type": "pic"
      }
    ]
  },
  "bottom_tips_visible": false,
  "bottom_tips_text": "",
  "ok": 1
}

The above situation is what I tested on the album page of Weibo.

Therefore, I think that we cannot stop sending requests based on whether the list is empty alone, and the value of since_id should be considered.

mikf commented 8 months ago

@YuanGYao should be fixed in https://github.com/mikf/gallery-dl/commit/5158cbb4c11ec360c803ef04472ba1993640155b, at least for album pages.

YuanGYao commented 8 months ago

@mikf I now install gallery-dl through scoop. If I clone this repository to get the latest code, how do I run it?

Hrxn commented 8 months ago

@YuanGYao python.exe C:\Path\to\gallery-dl-master\gallery_dl