firefox로 해결한 부분
- 현재 get_url_list에서 아래처럼 WebDriverException 오류 발생중인데요🥲,
따로 분리해서 처리하는게 좋을 것 같아서 먼저 리뷰 요청드려요!
--> chrome은 driver랑 chrome 버전이 안맞아서 사용 못하고, firefox 이용해서 해결 가능!!
```
*** Found local files:
*** * /opt/airflow/logs/dag_id=job_trend_daily/run_id=scheduled__2024-03-08T00:00:00+00:00/task_id=wanted.get_url_list/attempt=3.log
[2024-03-09, 03:31:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=
[2024-03-09, 03:31:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=requeueable deps ti=
[2024-03-09, 03:31:16 UTC] {taskinstance.py:2193} INFO - Starting attempt 3 of 3
[2024-03-09, 03:31:16 UTC] {taskinstance.py:2214} INFO - Executing on 2024-03-08 00:00:00+00:00
[2024-03-09, 03:31:16 UTC] {standard_task_runner.py:60} INFO - Started process 423 to run task
[2024-03-09, 03:31:16 UTC] {standard_task_runner.py:87} INFO - Running: ['***', 'tasks', 'run', 'job_trend_daily', 'wanted.get_url_list', 'scheduled__2024-03-08T00:00:00+00:00', '--job-id', '19', '--raw', '--subdir', 'DAGS_FOLDER/deploy_daily.py', '--cfg-path', '/tmp/tmp9p2seab7']
[2024-03-09, 03:31:16 UTC] {standard_task_runner.py:88} INFO - Job 19: Subtask wanted.get_url_list
[2024-03-09, 03:31:16 UTC] {task_command.py:423} INFO - Running on host 43e3f6697550
[2024-03-09, 03:31:17 UTC] {taskinstance.py:2510} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='admin' AIRFLOW_CTX_DAG_ID='job_trend_daily' AIRFLOW_CTX_TASK_ID='wanted.get_url_list' AIRFLOW_CTX_EXECUTION_DATE='2024-03-08T00:00:00+00:00' AIRFLOW_CTX_TRY_NUMBER='3' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-03-08T00:00:00+00:00'
[2024-03-09, 03:31:17 UTC] {taskinstance.py:2728} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 200, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 217, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/crawling.py", line 670, in get_url_list
driver = self.driver()
File "/home/airflow/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
super().__init__(
File "/home/airflow/.local/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 50, in __init__
self.service.start()
File "/home/airflow/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 102, in start
self.assert_process_still_running()
File "/home/airflow/.local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 115, in assert_process_still_running
raise WebDriverException(f"Service {self._path} unexpectedly exited. Status code was: {return_code}")
selenium.common.exceptions.WebDriverException: Message: Service /home/airflow/.cache/selenium/chromedriver/linux64/122.0.6261.111/chromedriver unexpectedly exited. Status code was: 127
[2024-03-09, 03:31:17 UTC] {taskinstance.py:1149} INFO - Marking task as FAILED. dag_id=job_trend_daily, task_id=wanted.get_url_list, execution_date=20240308T000000, start_date=20240309T033116, end_date=20240309T033117
[2024-03-09, 03:31:17 UTC] {standard_task_runner.py:107} ERROR - Failed to execute job 19 for task wanted.get_url_list (Message: Service /home/airflow/.cache/selenium/chromedriver/linux64/122.0.6261.111/chromedriver unexpectedly exited. Status code was: 127
; 423)
[2024-03-09, 03:31:17 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 1
[2024-03-09, 03:31:17 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks scheduled from follow-on schedule check
```
Resolve #13
firefox로 해결한 부분
- 현재 get_url_list에서 아래처럼 WebDriverException 오류 발생중인데요🥲, 따로 분리해서 처리하는게 좋을 것 같아서 먼저 리뷰 요청드려요! --> chrome은 driver랑 chrome 버전이 안맞아서 사용 못하고, firefox 이용해서 해결 가능!! ``` *** Found local files: *** * /opt/airflow/logs/dag_id=job_trend_daily/run_id=scheduled__2024-03-08T00:00:00+00:00/task_id=wanted.get_url_list/attempt=3.log [2024-03-09, 03:31:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=