针对Linux server版调用ChromeWebdriver错误的问题

Road-tech commented 1 year ago

我测试使用的是ubuntu，直接运行脚本会出现类似的错误 selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary

这是由于没有安装ChromeWebdriver导致的错误，可以通过手动修复，懒人可以通过以下命令安装：

pip install selenium
apt install chromium-chromedriver -y #ubuntu系统
apt install chromium-driver -y #Debian系统

由于使用服务器版本，缺少桌面环境，需要在download.py补充一些option 添加from selenium.webdriver.chrome.options import Options 原

dr = webdriver.Chrome()

改为

  options = Options()
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-dev-shm-usage')
  options.add_argument('--disable-extensions')
  options.add_argument('--headless')
  dr = webdriver.Chrome(chrome_options=options)

但是修改完依然有可能会出现错误

result: None
Traceback (most recent call last):
  File "/root/JableTVDownload/main.py", line 28, in <module>
    download(url)
  File "/root/JableTVDownload/download.py", line 43, in download
    m3u8url = result[0]
TypeError: 'NoneType' object is not subscriptable

就是获取的网页内容里找不到m3u8链接，分析 dr.page_source网页内容，发现被CF盾拦住了。

<body>
  <div id="cf-wrapper">
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>
    <div id="cf-error-details" class="cf-error-details-wrapper">
      <div class="cf-wrapper cf-header cf-error-overview">
        <h1 data-translate="block_headline">Sorry, you have been blocked</h1>
        <h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> jable.tv</h2>
      </div><!-- /.header -->

      <div class="cf-section cf-highlight">
        <div class="cf-wrapper">
          <div class="cf-screenshot-container cf-screenshot-full">

              <span class="cf-no-screenshot error"></span>

          </div>
        </div>
      </div><!-- /.captcha-container -->

      <div class="cf-section cf-wrapper">
        <div class="cf-columns two">
          <div class="cf-column">
            <h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>

            <p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
          </div>

          <div class="cf-column">
            <h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>

            <p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
          </div>
        </div>
      </div><!-- /.section -->

百度一下可以发现，添加的options.add_argument('--headless')参数是触发CF盾的元凶，可以通过以下选项避免，

  options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36")

然后就可以了，最终download.py代码如下：

import requests
import os
import re
import urllib.request
import m3u8
from Crypto.Cipher import AES
from config import headers
from crawler import prepareCrawl
from merge import mergeMp4
from delete import deleteM3u8, deleteMp4
from cover import get_cover
import time
import cloudscraper
from args import *
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def download(url):
  print('正在下載影片: ' + url)
  # 建立番號資料夾
  urlSplit = url.split('/')
  dirName = urlSplit[-2]
  if os.path.exists(f'{dirName}/{dirName}.mp4'):
    print('番號資料夾已存在, 跳過...')
    return
  if not os.path.exists(dirName):
      os.makedirs(dirName)
  folderPath = os.path.join(os.getcwd(), dirName)
  # 得到 m3u8 網址
  # htmlfile = cloudscraper.create_scraper(browser='chrome', delay=10).get(url)
  options = Options()
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-dev-shm-usage')
  options.add_argument('--disable-extensions')
  options.add_argument('--headless')
  options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36")
  dr = webdriver.Chrome(chrome_options=options)
  #dr = webdriver.Chrome()
  dr.get(url)
  result = re.search("https://.+m3u8", dr.page_source)
  print(f'result: {result}')
  m3u8url = result[0]
  print(f'm3u8url: {m3u8url}')

  m3u8urlList = m3u8url.split('/')
  m3u8urlList.pop(-1)
  downloadurl = '/'.join(m3u8urlList)

  # 儲存 m3u8 file 至資料夾
  m3u8file = os.path.join(folderPath, dirName + '.m3u8')
  urllib.request.urlretrieve(m3u8url, m3u8file)

  # 得到 m3u8 file裡的 URI和 IV
  m3u8obj = m3u8.load(m3u8file)
  m3u8uri = ''
  m3u8iv = ''

  for key in m3u8obj.keys:
      if key:
          m3u8uri = key.uri
          m3u8iv = key.iv

  # 儲存 ts網址 in tsList
  tsList = []
  for seg in m3u8obj.segments:
      tsUrl = downloadurl + '/' + seg.uri
      tsList.append(tsUrl)

  # 有加密
  if m3u8uri:
      m3u8keyurl = downloadurl + '/' + m3u8uri  # 得到 key 的網址
      # 得到 key的內容
      response = requests.get(m3u8keyurl, headers=headers, timeout=10)
      contentKey = response.content

      vt = m3u8iv.replace("0x", "")[:16].encode()  # IV取前16位

      ci = AES.new(contentKey, AES.MODE_CBC, vt)  # 建構解碼器
  else:
      ci = ''

  # 刪除m3u8 file
  deleteM3u8(folderPath)

  # 開始爬蟲並下載mp4片段至資料夾
  prepareCrawl(ci, folderPath, tsList)

  # 合成mp4
  mergeMp4(folderPath, tsList)

  # 刪除子mp4
  deleteMp4(folderPath)

  # get cover
  get_cover(html_file=dr.page_source, folder_path=folderPath)

以上

hcjohn463 commented 1 year ago

已Merge。

Road-tech commented 1 year ago

不妨试试docker https://github.com/Road-tech/Docker_JableTVDownload在 2023年5月13日，上午12:07，澳诹科技 @.***> 写道：换了一个版本从新运行又报错 Traceback (most recent call last): File "main.py", line 26, in download(url) File "/volume2/shiyuan/Videos/JableTVDownload-main/download.py", line 40, in download dr = webdriver.Chrome(chrome_options=options) File "/var/packages/py3k/target/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 82, in init service.path = DriverFinder.get_path(service, options) File "/var/packages/py3k/target/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/driver_finder.py", line 40, in get_path path = shutil.which(service.path) or SeleniumManager().driver_location(options) File "/var/packages/py3k/target/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/selenium_manager.py", line 91, in driver_location result = self.run(args) File "/var/packages/py3k/target/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/selenium_manager.py", line 109, in run output = json.loads(stdout) File "/var/packages/py3k/target/usr/local/lib/python3.8/json/init.py", line 357, in loads return _default_decoder.decode(s) File "/var/packages/py3k/target/usr/local/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/var/packages/py3k/target/usr/local/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

hcjohn463 / JableTVDownload

针对Linux server版调用ChromeWebdriver错误的问题 #94