Closed vivekatro closed 6 years ago
@vivekatro Good find! I had noticed this before on a similar domain. Would you like to make a pull request :) ?
Pretty simple way to fix just use encodeURIComponent() when collecting strings and decodeURIComponent() when displaying them back this should preserve japanese characters.
@vivekatro I (think) the issue is over at micro-unfurl see my PR https://github.com/beeman/micro-unfurl/pull/3
Issue fixed by @jacktuck and deployed :)
<3 thanks @beeman and @vivekatro
Thanks @jacktuck for fixing it quickly!
@vivekatro Turns out this was also a bug for unfurl.js - we didn't handle servers responding with multibyte encodings. I noticed this when http://qq.com
returned charset as GB2312
.
Other encodings are:
Japanese: Shift_JIS, Windows-31j, Windows932, EUC-JP
Chinese: GB2312, GBK, GB18030, Windows936, EUC-CN
Korean: KS_C_5601, Windows949, EUC-KR
Taiwan/Hong Kong: Big5, Big5-HKSCS, Windows950
This PR has been merged and should now be fixed in unfurl.js https://github.com/jacktuck/unfurl/pull/31
And there's an open PR https://github.com/beeman/micro-unfurl/pull/4 which bumps micro-unfurl to use this release.
There is also a tagged release on npm 1.1.7
which you can install with npm install unfurl.js@1.1.7
or npm install unfurl.js@beta
Leaving this open for now until I do a release under latest tag.
I need to fix the benchmarks at some point too and see how this fix impacts performance.
Should be fixed thanks to @jacktuck !
It seems like on the new version, the structure of the response has changed,
with 1.1.6
for https://www.yahoo.co.jp
I get following response,
{
"other" : {
"description" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"robots" : "noodp",
"googleSiteVerification" : "fsLMOiigp5fIpCDMEVodQnQC7jIY1K3UXW5QkQcBmVs",
"alternate" : "https://m.yahoo.co.jp/",
"canonical" : "https://www.yahoo.co.jp/",
"fbAppId" : "472870002762883",
"title" : "Yahoo! JAPAN\n",
"stylesheet" : "//s.yimg.jp/images/top/sp2/clr/180312/1.css"
},
"ogp" : {
"ogTitle" : "Yahoo! JAPAN",
"ogType" : "article",
"ogUrl" : "https://www.yahoo.co.jp/",
"ogImage" : [ {
"url" : "https://s.yimg.jp/images/top/ogp/fb_y_1500px.png"
} ],
"ogDescription" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"ogSiteName" : "Yahoo! JAPAN"
},
"twitter" : {
"twitterCard" : "summary_large_image",
"twitterSite" : "@Yahoo_JAPAN_PR",
"twitterTitle" : "Yahoo! JAPAN",
"twitterDescription" : "日本最大級のポータルサイト。検索、オークション、ニュース、天気、スポーツ、メール、ショッピングなど多数のサービスを展開。あなたの生活をより豊かにする「課題解決エンジン」を目指していきます。",
"twitterImage" : [ {
"url" : "https://s.yimg.jp/images/top/ogp/tw_y_1400px.png"
} ]
}
}
But with the latest version, the twitter
and ogp
sections are not coming.
@vivekatro Thanks for this i'll take a look this evening
https://unfurl.now.sh/?url=https://www.rakuten.co.jp
breaks, I get a timeout error. but it used to work with 1.1.6
@vivekatro Default timeout is 2000ms and this site takes 8000ms for me, you can pass your own timeout, though. For example, you can set a timeout of 20 seconds like so: unfurl(url, { timeout: 20 * 1000 } )
.
Mm actually 1.1.6 used request rather than node-fetch so it had a higher timeout. I'll make a change to match the old timeout i think too or just increase the default to something sensible. Good find.
From request: timeout - integer containing the number of milliseconds to wait for a server to send response headers (and start the response body) before aborting the request. Note that if the underlying TCP connection cannot be established, the OS-wide TCP connection timeout will overrule the timeout option (the default in Linux can be anywhere from 20-120 seconds).
Timeout should default to OS limit again now in version 1.1.8-beta.2
Were you able to check on yahoo.co.jp unfurling missing ogp and Twitter in the new version?
@vivekatro Not yet. I need to make a test demonstrating whats missing etc to make sure it doesn't regress again.
@vivekatro Good spot on yahoo.co.jp. Just found the issue which was User-Agent was not being set properly in the new release. It just so happened yahoo will not send you meta tags unless you are a bot, which is why setting User-Agent to facebook's (the default user-agent) works.
I've fixed it on the newest prerelease of unfurl.js@2.x.x (see README first), let me know if you'd like me to patch 1.6.x too, it's trivial to do so. :)
https://unfurl.now.sh/?url=https://www.yahoo.co.jp AL the Japanese chars are messed up.