Closed wongchenv closed 3 years ago
The mp.weixin.qq.com URLs load the title onto the page using Javascript. If we use a headless browser or something of that nature I could let Javascript execute then grab the title after the page has loaded but that wouldn't function on mobile.
Hypothetically if we had some kind of funds for this project I could build a new CORS proxy using a headless browser and use that for URL fetching, fixing both the issue of Javascript loaded pages as well as encoding issues but as I'd need to pay to host it publicly that's not something I'm considering at the moment.
Alternatively that browser/API could be run locally by a user and turned on in settings. I'll think on this as well!
I've got a local scraping solution working on desktop, it still doesn't get a title out of weixin.qq.com but it succeeds at the other two. Need to perform mobile tests.
[Title Unknown](https://mp.weixin.qq.com/mp/appmsgalbum?__biz=Mzg4MjAwNTUwNw==&action=getalbum&album_id=1448541657456295937&scene=173&from_msgid=2247484083&from_itemidx=1&count=10#wechat_redirect&scene=0&subscene=90&sessionid=1606652573&enterid=1606653138)
[弱点 (豆瓣)](https://movie.douban.com/subject/3552028/)
[手绘100张,耗时1个月,我终于破解了【达芬奇密码书】的全部秘密!_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili](https://www.bilibili.com/video/BV1qy4y1t7fn?spm_id_from=333.851.b_7265636f6d6d656e64.3)
Fixed for Desktop on 1.2.0 Mobile still relies on the CORS proxy that doesn't support these characters.
WeChat links failed again. The link is given here: https://mp.weixin.qq.com/s/nVilywouNxnZlb-l3Buj3w
Is it possible to set a rule, and for websites following this rule we will fetch the whole page and get the title for them?
There are also some links that cannot be obtained, and the problem of abnormal acquisition: