langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
46.67k stars 6.59k forks source link

[Cloud] Missing added pages when trying to add a new page to Knowledge Base with Notion integration #4921

Closed davidphamle closed 3 months ago

davidphamle commented 3 months ago

Self Checks

Dify version

0.6.9

Cloud or Self Hosted

Cloud

Steps to reproduce

  1. Go to Knowledge on Dify Cloud
  2. Select a Knowledge Base that has Notion integration
  3. Click on Add Page
  4. Select a page and click on Save
  5. Click on Add Page
  6. Unselect the previous page and select another page, click on Save

✔️ Expected Behavior

The new page is added and the previous page is still in the knowledge base

❌ Actual Behavior

The new page is added and the previous page disappeared from the knowledge page

dosubot[bot] commented 3 months ago

Hey @davidphamle, I'm here to help you out with your bug report and any questions you might have while waiting for a human maintainer. Let's get this sorted out!

To resolve the issue in Dify Cloud version 0.6.9 where adding a new page to the Knowledge Base with Notion integration causes the previously added page to disappear, you need to ensure that the _load_data_as_documents method in the NotionExtractor class correctly handles the addition of new pages without overwriting the existing ones.

Here are the steps to address this:

  1. Ensure Proper Handling in _load_data_as_documents: Verify that the _load_data_as_documents method correctly appends new pages to the docs list without overwriting existing ones.

    def _load_data_as_documents(self, notion_obj_id: str, notion_page_type: str) -> list[Document]:
       docs = []
       if notion_page_type == 'database':
           page_text_documents = self._get_notion_database_data(notion_obj_id)
           docs.extend(page_text_documents)
       elif notion_page_type == 'page':
           page_text_list = self._get_notion_block_data(notion_obj_id)
           docs.append(Document(page_content='\n'.join(page_text_list)))
       else:
           raise ValueError("notion page type not supported")
       return docs
  2. Check DocumentModel Updates: Ensure that the DocumentModel is being correctly updated and committed to the database without overwriting existing entries.

  3. Review sync_data_source Method: In the NotionOAuth class, ensure that the sync_data_source method correctly handles the addition of new pages without overwriting existing ones.

    def sync_data_source(self, binding_id: str):
       data_source_binding = DataSourceBinding.query.filter(
           db.and_(
               DataSourceBinding.tenant_id == current_user.current_tenant_id,
               DataSourceBinding.provider == 'notion',
               DataSourceBinding.id == binding_id,
               DataSourceBinding.disabled == False
           )
       ).first()
       if data_source_binding:
           pages = self.get_authorized_pages(data_source_binding.access_token)
           source_info = data_source_binding.source_info
           new_source_info = {
               'workspace_name': source_info['workspace_name'],
               'workspace_icon': source_info['workspace_icon'],
               'workspace_id': source_info['workspace_id'],
               'pages': pages,
               'total': len(pages)
           }
           data_source_binding.source_info = new_source_info
           data_source_binding.disabled = False
           db.session.commit()
       else:
           raise ValueError('Data source binding not found')
  4. Debug Database Operations: If the issue persists, debug the database operations to ensure that the DocumentModel is not being overwritten or incorrectly updated.

By following these steps, you should be able to resolve the issue where adding a new page causes the previously added page to disappear [1][2].

To continue talking to Dosu, mention @dosu.

davidphamle commented 3 months ago

Any human maintainer here? @perzeuss

perzeuss commented 3 months ago

Any human maintainer here? @perzeuss

A maintainer has already been assigned, but the team is very busy. It could take a while until they have time for this.

The problem is that Dify does not remove notion pages from knowledge when you deselect them in the sync settings, right? But you can manually remove them from knowledge and the notion sync stops for this page? Or does Dify still sync the removed page when you make changes?

davidphamle commented 3 months ago
  1. The problem is that whenever I try to add a new page, it will remove all the previous one
  2. If I try to select all page and resync, it will stop with an error of Batch limit exceed 20 ... In the end, I am unable to have more than 20 pages in my knowledge base connected to Notion
davidphamle commented 3 months ago

Please respond @perzeuss

perzeuss commented 3 months ago

Please respond @perzeuss

The problem is clear, but someone has to have time to look at it.

iamjoel commented 3 months ago

@zxhlyh will fix it.