langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
48.49k stars 6.94k forks source link

"local variable 'api_key' referenced before assignment" using Jina to crawl #8943

Closed damadorPL closed 2 weeks ago

damadorPL commented 2 weeks ago

Self Checks

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

followup to https://github.com/langgenius/dify/issues/8934

now when it comes to "Text Preprocessing and Cleaning"

there pop up local variable 'api_key' referenced before assignment using

Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
INFO:matplotlib.font_manager:generated new fontManager
Preparing database migration...
Starting database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
[2024-09-30 14:28:22 +0000] [1] [INFO] Starting gunicorn 22.0.0
[2024-09-30 14:28:22 +0000] [1] [INFO] Listening at: http://0.0.0.0:5001 (1)
[2024-09-30 14:28:22 +0000] [1] [INFO] Using worker: gevent
[2024-09-30 14:28:22 +0000] [51] [INFO] Booting worker with pid: 51
2024-09-30 14:31:17,588.588 ERROR [Dummy-1] [app.py:838] - Exception on /console/api/datasets/indexing-estimate [POST]
Traceback (most recent call last):
  File "/app/api/controllers/console/datasets/datasets.py", line 441, in post
    response = indexing_runner.indexing_estimate(
  File "/app/api/core/indexing_runner.py", line 261, in indexing_estimate
    text_docs = index_processor.extract(extract_setting, process_rule_mode=tmp_processing_rule["mode"])
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 20, in extract
    text_docs = ExtractProcessor.extract(
  File "/app/api/core/rag/extractor/extract_processor.py", line 183, in extract
    return extractor.extract()
  File "/app/api/core/rag/extractor/jina_reader_extractor.py", line 23, in extract
    crawl_data = WebsiteService.get_crawl_url_data(self.job_id, "jinareader", self._url, self.tenant_id)
  File "/app/api/services/website_service.py", line 197, in get_crawl_url_data
    headers={"Accept": "application/json", "Authorization": f"Bearer {api_key}"},
UnboundLocalError: local variable 'api_key' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 489, in wrapper
    resp = resource(*args, **kwargs)
  File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 604, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/app/api/controllers/console/setup.py", line 65, in decorated
    return view(*args, **kwargs)
  File "/app/api/libs/login.py", line 93, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
  File "/app/api/controllers/console/wraps.py", line 22, in decorated
    return view(*args, **kwargs)
  File "/app/api/controllers/console/datasets/datasets.py", line 457, in post
    raise IndexingEstimateError(str(e))
controllers.console.datasets.error.IndexingEstimateError: 500 Internal Server Error: local variable 'api_key' referenced before assignment
2024-09-30 14:32:18,137.137 ERROR [Dummy-2] [app.py:838] - Exception on /console/api/datasets/indexing-estimate [POST]
Traceback (most recent call last):
  File "/app/api/controllers/console/datasets/datasets.py", line 441, in post
    response = indexing_runner.indexing_estimate(
  File "/app/api/core/indexing_runner.py", line 261, in indexing_estimate
    text_docs = index_processor.extract(extract_setting, process_rule_mode=tmp_processing_rule["mode"])
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 20, in extract
    text_docs = ExtractProcessor.extract(
  File "/app/api/core/rag/extractor/extract_processor.py", line 183, in extract
    return extractor.extract()
  File "/app/api/core/rag/extractor/jina_reader_extractor.py", line 23, in extract
    crawl_data = WebsiteService.get_crawl_url_data(self.job_id, "jinareader", self._url, self.tenant_id)
  File "/app/api/services/website_service.py", line 197, in get_crawl_url_data
    headers={"Accept": "application/json", "Authorization": f"Bearer {api_key}"},
UnboundLocalError: local variable 'api_key' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 489, in wrapper
    resp = resource(*args, **kwargs)
  File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 604, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/app/api/controllers/console/setup.py", line 65, in decorated
    return view(*args, **kwargs)
  File "/app/api/libs/login.py", line 93, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
  File "/app/api/controllers/console/wraps.py", line 22, in decorated
    return view(*args, **kwargs)
  File "/app/api/controllers/console/datasets/datasets.py", line 457, in post
    raise IndexingEstimateError(str(e))
controllers.console.datasets.error.IndexingEstimateError: 500 Internal Server Error: local variable 'api_key' referenced before assignment
2024-09-30 14:33:18,955.955 ERROR [Dummy-3] [app.py:838] - Exception on /console/api/datasets/indexing-estimate [POST]
Traceback (most recent call last):
  File "/app/api/controllers/console/datasets/datasets.py", line 441, in post
    response = indexing_runner.indexing_estimate(
  File "/app/api/core/indexing_runner.py", line 261, in indexing_estimate
    text_docs = index_processor.extract(extract_setting, process_rule_mode=tmp_processing_rule["mode"])
  File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 20, in extract
    text_docs = ExtractProcessor.extract(
  File "/app/api/core/rag/extractor/extract_processor.py", line 183, in extract
    return extractor.extract()
  File "/app/api/core/rag/extractor/jina_reader_extractor.py", line 23, in extract
    crawl_data = WebsiteService.get_crawl_url_data(self.job_id, "jinareader", self._url, self.tenant_id)
  File "/app/api/services/website_service.py", line 197, in get_crawl_url_data
    headers={"Accept": "application/json", "Authorization": f"Bearer {api_key}"},
UnboundLocalError: local variable 'api_key' referenced before assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 489, in wrapper
    resp = resource(*args, **kwargs)
  File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 604, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/app/api/controllers/console/setup.py", line 65, in decorated
    return view(*args, **kwargs)
  File "/app/api/libs/login.py", line 93, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
  File "/app/api/controllers/console/wraps.py", line 22, in decorated
    return view(*args, **kwargs)
  File "/app/api/controllers/console/datasets/datasets.py", line 457, in post
    raise IndexingEstimateError(str(e))
controllers.console.datasets.error.IndexingEstimateError: 500 Internal Server Error: local variable 'api_key' referenced before assignment

✔️ Expected Behavior

No response

❌ Actual Behavior

/console/api/datasets/indexing-estimate 500 (Internal Server Error)

dosubot[bot] commented 2 weeks ago

To resolve the "local variable 'api_key' referenced before assignment" error in the "Text Preprocessing and Cleaning" step when using Jina with Dify version 0.9.1 (self-hosted via Docker), ensure that the api_key variable is properly assigned before it is used. In the provided code, the api_key parameter in the translate_text function has a default value set to DIFY_API_KEY.

Here is the relevant part of the code:

DIFY_API_KEY = "example-key"

def translate_text(api_key=DIFY_API_KEY, user_id="mark-doc", inputs={'input_text': 'Hello, how are you?', 'target_language': 'es'}):
    url = 'https://api.dify.ai/v1/workflows/run'
    headers = {
      'Authorization': f'Bearer {api_key}',
      'Content-Type': 'application/json'
    }
    data = {
      'inputs': inputs,
      'response_mode': 'blocking',
      'user': user_id
    }
    response = requests.post(url, headers=headers, json=data)

    try:
        response.raise_for_status()
        json_response = response.json()
        return json_response['data']['outputs']
    except requests.exceptions.HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except requests.exceptions.RequestException as req_err:
        print(f'Request error occurred: {req_err}')
    except json.decoder.JSONDecodeError:
        print('Failed to decode JSON response')

    return None

The api_key is correctly assigned a default value of DIFY_API_KEY. If you are still encountering the error, ensure that DIFY_API_KEY is defined and not overwritten or modified elsewhere in your code.

If you need to pass a different api_key dynamically, make sure to pass it explicitly when calling the translate_text function:

translated_content = translate_text(api_key="your_actual_api_key", inputs={'input_text': content, 'target_language': target_language})

This should resolve the "local variable 'api_key' referenced before assignment" error [1].

To continue talking to Dosu, mention @dosu.