zolrath / obsidian-auto-link-title

Automatically fetch the titles of pasted links
MIT License
463 stars 62 forks source link

There are also some links that cannot be obtained, and the problem of abnormal acquisition #3

Closed wongchenv closed 3 years ago

wongchenv commented 3 years ago

There are also some links that cannot be obtained, and the problem of abnormal acquisition:

zolrath commented 3 years ago

The mp.weixin.qq.com URLs load the title onto the page using Javascript. If we use a headless browser or something of that nature I could let Javascript execute then grab the title after the page has loaded but that wouldn't function on mobile.

Hypothetically if we had some kind of funds for this project I could build a new CORS proxy using a headless browser and use that for URL fetching, fixing both the issue of Javascript loaded pages as well as encoding issues but as I'd need to pay to host it publicly that's not something I'm considering at the moment.

Alternatively that browser/API could be run locally by a user and turned on in settings. I'll think on this as well!

zolrath commented 3 years ago

I've got a local scraping solution working on desktop, it still doesn't get a title out of weixin.qq.com but it succeeds at the other two. Need to perform mobile tests.

[Title Unknown](https://mp.weixin.qq.com/mp/appmsgalbum?__biz=Mzg4MjAwNTUwNw==&action=getalbum&album_id=1448541657456295937&scene=173&from_msgid=2247484083&from_itemidx=1&count=10#wechat_redirect&scene=0&subscene=90&sessionid=1606652573&enterid=1606653138)

[弱点 (豆瓣)](https://movie.douban.com/subject/3552028/)

[手绘100张,耗时1个月,我终于破解了【达芬奇密码书】的全部秘密!_哔哩哔哩 (゜-゜)つロ 干杯~-bilibili](https://www.bilibili.com/video/BV1qy4y1t7fn?spm_id_from=333.851.b_7265636f6d6d656e64.3)
zolrath commented 3 years ago

Fixed for Desktop on 1.2.0 Mobile still relies on the CORS proxy that doesn't support these characters.

DDDOH commented 11 months ago

WeChat links failed again. The link is given here: https://mp.weixin.qq.com/s/nVilywouNxnZlb-l3Buj3w

image

Is it possible to set a rule, and for websites following this rule we will fetch the whole page and get the title for them?