Open BURG3R5 opened 3 weeks ago
Hey @BURG3R5! 👋 I'm here to help you with any bugs, questions, or contributions you have for Dify. I'm a bot designed to assist you while waiting for a human maintainer. Let's tackle this issue together!
I found a similar issue that was previously reported and closed: [Cloud] Missing added pages when trying to add a new page to Knowledge Base with Notion integration. This issue was resolved by a pull request that fixed the problem with adding Notion pages to the knowledge base [1].
To troubleshoot and resolve your issue with incomplete syncing of Notion pages in Dify, you can follow these steps:
Authorization Verification: Ensure that the authorization verification is completed successfully. This involves binding the Notion content to Dify either during dataset creation or through the settings under Data Sources.
Check Integration Configuration: Verify that the integration with Notion is correctly configured. Ensure that the environment variables in the .env
file are set correctly for either internal or public integration.
Segmentation and Cleaning: Confirm that the segmentation settings and indexing method are correctly configured. This affects how data is processed and imported.
Synchronization Process: If the Notion content is modified, ensure that the synchronization is triggered by clicking Sync in the Dify dataset Document List Page.
Data Type Limitations: Be aware that images and files are not supported for import, and tabular data will be converted to text display, which might affect the completeness of the sync.
Automatic Sync Setup: Ensure that the automatic sync is set up correctly so that any updates in Notion are automatically reflected in Dify.
If these steps do not resolve the issue, further investigation into the specific configuration and logs may be necessary. Additionally, ensure that you are following the specific steps for internal or public integration as outlined in the Dify documentation [2][3][4].
To continue talking to Dosu, mention @dosu.
@dosu The mentioned issue is not really similar to mine. I need help from a human maintainer.
This is a tough nut to crack, could really use your help on this, @takatost.
To continue talking to Dosu, mention @dosu.
Additionally, I have already attempted the fixes the bot suggested, and none of them helped.
Is there any updates on this issue? I faced the same problem of missing pages during sync, both cloud and self hosted
@JirehC I've moved on to downloading Notion pages as Markdown and uploading them to an empty knowledge base using the API. Can't trust Dify to respond in time or even give proper error messages, so I'm implementing the upload and sync processes myself. At least this way I have more control over the processes and can actually see what's going wrong.
Not gonna close the issue because this is still an unsolved problem in the product.
@BURG3R5 I've found a potential cause for this, and it's not something we can fix, it might even be a Notion
Problem rather than aDify
problem. A potential problem with the syncing is the 'time-out' issue. For example, exporting Notion
workspace to Anytype
through integration also has the same issue as Dify
where pages gone missing. Anytype
explained that the file size could be too big to make it before the time-out.
I've also moved to download Notion
pages as markdown but got fed up uploading them one by one because I couldn't find anything about uploading via API. Could you send me the link to the documentation on how to do it?
Edit: Is this the documentation you follow? https://docs.dify.ai/guides/knowledge-base/maintain-dataset-via-api
@JirehC
page_size
parameter to be small enough should resolve the timeout issues.I haven't explored the Dify repository myself to figure out exactly where the issue is, since this is a massive mono-repo containing everything from frontend to client SDKs to backend and so on. I wish a maintainer would provide insights on what is the code that's failing.
Thanks @BURG3R5. I haven't discovered that library, this is great. Thanks for sharing!
@crazywoola @JohnJyong
Self Checks
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Followed the typical steps to connect Dify to Notion using an internal integration.
✔️ Expected Behavior
We have 1228 pages in our Notion database. All of them should be indexed into the Knowledge Base.
❌ Actual Behavior
The selection screen shows 125 pages selected and upon initial sync completion, only 107 documents are indexed.