libwaifu / BilibiliLink

Mozilla Public License 2.0
17 stars 4 forks source link

考古::获得哔哩哔哩的主站版头 #1

Open oovm opened 6 years ago

oovm commented 6 years ago

所谓的bilibili banner 就是如下这个部分:

image

image

oovm commented 6 years ago

审查源码发现以下字段:

<div 
    id="banner_link" 
    class="head-banner 
    report-wrap-module 
    report-scroll-module" 
    style="background-image: url(&quot;//i0.hdslb.com/bfs/archive/0ac04c23af3b3297bf02dca163474326898d211d.png&quot;);" 
    scrollshow="true">
</div>

background-image 给出了banner的图片链接

image

oovm commented 6 years ago

发现有些banner使用http://static.hdslb.com/images/header/`date`_banner.jpg的格式命名

于是我们可以壁咚哔哩哔哩服务器:

getbanner=Table[
   date->URLExecute@StringTemplate["http://static.hdslb.com/images/header/`date`_banner.jpg"][<|"date"->date|>],
   {date,DateString[#,{"Year","Month","Day"}]&/@DayRange[DateObject[{2012,1,1}],DateObject[{2017,1,1}]]}
];
Select[getbanner,!StringQ[Last@#]&]//TableForm

然而很不幸只找到三张图...

image

oovm commented 6 years ago

抓包发现了一个jQuery请求, 清洗之后发现了一个接口: https://api.bilibili.com//x/web-show/res/loc?pf=0&id=142

<|
    id->1,
    contract_id->,
    pos_num->0,
    name->,
    pic->http://i0.hdslb.com/bfs/archive/0ac04c23af3b3297bf02dca163474326898d211d.png,
    litpic->http://i0.hdslb.com/bfs/archive/bdb288021ff854d3ac618ac8c1eafd300ec9ed9b.png,
    url->,
    style->0,
    agency->,
    label->,
    intro->,
    area->0,
    is_ad_loc->False,
    ad_cb->,
    title->,
    server_type->0,
    cm_mark->0
|>

但是这里的pf和id意义不明. 修改并不能找到有效的返回值

oovm commented 6 years ago

壁咚这个接口一中午的结果, 从1壁咚到10000, 有效参数如下:

{
21, 23, 29, 31, 34, 40, 42, 44, 52, 58, 64, 70, 76, 82, 88, 94, 100, \
106, 112, 118, 124, 126, 128, 130, 132, 134, 136, 138, 142, 148, 151, \
152, 153, 160, 162, 243, 245, 247, 249, 251, 253, 255, 257, 259, 261, \
263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, \
291, 293, 295, 395, 403, 405, 406, 412, 413, 414, 415, 417, 418, 419, \
1466, 1550, 1554, 1556, 1558, 1560, 1562, 1564, 1566, 1568, 1570, \
1572, 1574, 1576, 1578, 1580, 1582, 1584, 1586, 1588, 1590, 1592, \
1594, 1596, 1598, 1600, 1602, 1604, 1606, 1608, 1610, 1612, 1614, \
1616, 1618, 1620, 1622, 1624, 1626, 1628, 1630, 1632, 1634, 1636, \
1660, 1666, 1670, 1674, 1680, 1682, 1919, 1920, 1921, 1922, 1923, \
1966, 2034, 2047, 2048, 2057, 2058, 2061, 2062, 2065, 2066, 2067, \
2078, 2079, 2207, 2210, 2211, 2212, 2213, 2214, 2257, 2260, 2261, \
2262, 2263, 2264, 2307, 2308, 2309, 2319, 2341, 2343, 2345, 2403, \
2452, 2453, 2462, 2463, 2472, 2473, 2482, 2483, 2492, 2493, 2503
}

pf猜测是platform的意思, 也就是安卓苹果还是PC

oovm commented 6 years ago

分析发现 id=142,1580,1586,1592,1600,1608,1620,1622 的接口都可以获得版头... 编号全靠口胡吗....

pf的取值从0到9,0最全,其他次一点, 不知什么意思. 而且pf不同的话id一定是不同的, 精力有限, 无法全部壁咚一遍.

这些接口看来不能获得历史信息

所以我们可以试试网页时光机: https://archive.li/*.hdslb.com

遍历了一遍, 用宽高比作为filter得到以下结果:

{
    "https://static.hdslb.com/images/header/20140904_banner.jpg",
    "https://static.hdslb.com/images/header/20141001_banner.jpg",
    "https://static.hdslb.com/images/header/20150501_banner.jpg",
    "https://i0.hdslb.com/bfs/archive/04c6445aabd7fd0a414718b971d5f4b49e5ea153.png",
    "https://i0.hdslb.com/bfs/archive/0ac04c23af3b3297bf02dca163474326898d211d.png",
    "https://i0.hdslb.com/headers/9223b82c56217ec45d6c74c102b96ff2.jpg",
    "https://i0.hdslb.com/headers/ac23bfc2b0c586777e74812d91e6a30b.png",
    "https://i0.hdslb.com/headers/f119744427b69c79260eddb42068e751.jpg",
    "https://i0.hdslb.com/headers/acbb92337a30cc748a1cd416aa19da5f.png",
    "https://i0.hdslb.com/headers/a282e7981909676b19044e467b1c807a.jpg",
    "https://i0.hdslb.com/headers/2198b4b6852bd5351a0eeba376fd4d82.jpg",
    "https://i0.hdslb.com/headers/428141f340414d3308856e003a341e67.png",
    "https://i0.hdslb.com/headers/f330bac7fb551ac44cc34ac5df9d1667.jpg",
    "https://i0.hdslb.com/headers/16e90ed7e4d5357027104ccbe63e77e5.jpg"
}
oovm commented 6 years ago

哇, 发现有人比我早发现了这个, 记录了2016年来的banner

https://www.biliplus.com/task/banner_fetch/

不过更早的banner还是要想想办法...

oovm commented 6 years ago

膜一波考古帝:

https://tieba.baidu.com/p/3888916441

不过我还是想找到原始链接, 考古帝给了我希望, 至少不会说链接现在全没了永远找不全了...

image

image