langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
50.36k stars 7.22k forks source link

IO Error After Upgrade #4244

Closed thiner closed 5 months ago

thiner commented 5 months ago

Self Checks

Dify version

0.6.7

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I deployed Dify in K8s. Mount the 'api/storage/privatekeys' by PVC. I upgrade the Dify docker image to 0.6.7.

✔️ Expected Behavior

Dify should work, all existing work flow, knowledge base, model providers remain as-is.

❌ Actual Behavior

Couldn't load knowledge base, model providers, the error log shows libs.rsa.PrivkeyNotFoundError: Private key not found, tenant_id:.... By attaching into the image console, an IO error pops up while ls /app/api/storage/privatekeys. Seems the folder "privatekeys" is corrupted. This error actually happens every time when I upgrade the docker image version. One observation might relevant to this issue is that the tennant_id was regenerated in every upgrades.

dosubot[bot] commented 5 months ago

To address the libs.rsa.PrivkeyNotFoundError and the issue with the tenant_id regenerating after upgrading Dify in your Kubernetes deployment, follow these steps:

  1. Verify PVC Configuration: Ensure the api/storage/privatekeys directory is correctly mounted to a Persistent Volume through a Persistent Volume Claim. This is crucial for data persistence across pod restarts and upgrades.

  2. Regenerate Encryption Keys: If the "privatekeys" folder appears corrupted, you might need to regenerate the encryption keys. This can be done by executing a command within your Docker container to reset the encryption key pair. However, ensure you have backups of the original keys if needed for decrypting existing data.

  3. Handle tenant_id Regeneration: The regeneration of tenant_id during upgrades is likely due to a specific Alembic migration script. Modifying or removing the part of the script that causes tenant_id to change could prevent this issue, but be cautious as it might affect other functionalities.

  4. Backup and Upgrade: Always backup your data before performing an upgrade. After pulling the latest Dify docker image and applying Kubernetes deployment changes, ensure that the PVCs are correctly referenced to maintain data persistence.

  5. Post-Upgrade Checks: After upgrading, verify the application's functionality and check that the tenant_id remains unchanged. Monitor the logs for any errors related to encryption keys and ensure they are functioning as expected.

For detailed guidance, refer to the Dify documentation. This approach should help mitigate the issues you're facing with upgrades and data persistence.

#### Details **Sources** - [en/getting-started/install-self-hosted/install-faq.md](https://github.com/langgenius/dify-docs/blob/main/en/getting-started/install-self-hosted/install-faq.md) - [api/migrations/versions/2e9819ca5b28_add_tenant_id_in_api_token.py](https://github.com/langgenius/dify/blob/main/api/migrations/versions/2e9819ca5b28_add_tenant_id_in_api_token.py) - [en/getting-started/install-self-hosted/environments.md](https://github.com/langgenius/dify-docs/blob/main/en/getting-started/install-self-hosted/environments.md) - [docker/docker-compose.middleware.yaml](https://github.com/langgenius/dify/blob/main/docker/docker-compose.middleware.yaml) - [api/.env.example](https://github.com/langgenius/dify/blob/main/api/.env.example) - [api/libs/rsa.py](https://github.com/langgenius/dify/blob/main/api/libs/rsa.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

crazywoola commented 5 months ago

Maybe it's related to k8s itself. Since we testing in compose files, it doesn't need any further steps besides the docker compose pull.

thiner commented 5 months ago

Maybe it's related to k8s itself. Since we testing in compose files, it doesn't need any further steps besides the docker compose pull.

In what kind of condition the tenant_id will be regenerated?

thiner commented 5 months ago

Problem solved. The root cause is that dify-api and dify-worker were deployed to different nodes, thus one of the deployment couldn't access path /app/api/storage. For anyone encountered this error while deploying on K8s, you should make sure dify-api and dify-worker are deployed to the same node. You can achieve the goal by setting the "nodeSelector" in deployment yaml file.