syuilo / summaly

🔍 Get a summary of any web page
MIT License
47 stars 24 forks source link

YouTubeでplayerが取得できない #124

Closed mei23 closed 5 years ago

mei23 commented 5 years ago

YouTubeはUser-Agentによってはtwitter:playerを出してくれない?

curl -LSs -w '\n' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'
      <meta name="twitter:player" content="https://www.youtube.com/embed/jNQXAC9IVRw">
      <meta name="twitter:player:width" content="480">
      <meta name="twitter:player:height" content="360">

Firefox

curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'
# なし

Chrome

curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'
# なし
acid-chicken commented 5 years ago

/(Firefox|Chrome)/\d+(\.\d+)*/がNGワードそう

➜ curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'

➜ curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'
      <meta name="twitter:player" content="https://www.youtube.com/embed/jNQXAC9IVRw">
      <meta name="twitter:player:width" content="480">
      <meta name="twitter:player:height" content="360">
➜ curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'

➜ curl -LSs -w '\n' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Safari/537.36'-- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'
      <meta name="twitter:player" content="https://www.youtube.com/embed/jNQXAC9IVRw">
      <meta name="twitter:player:width" content="480">      <meta name="twitter:player:height" content="360">
acid-chicken commented 5 years ago

Summalyの独自UAを利用するのがよさそう

➜ curl -LSs -w '\n' -H 'User-Agent: summaly/2.1.3' -- 'https://www.youtube.com/watch?v=jNQXAC9IVRw' | grep 'twitter:player'      <meta name="twitter:player" content="https://www.youtube.com/embed/jNQXAC9IVRw">
      <meta name="twitter:player:width" content="480">
      <meta name="twitter:player:height" content="360">
mei23 commented 5 years ago

使用ライブラリの既定値がChromeらしいわ https://github.com/ktty1220/cheerio-httpcli#setbrowserbrowser-type