Closed kokodak closed 1 month ago
The new method FindDocInfosByKeys
was introduced to the DB
struct in database.go
, enabling the retrieval of multiple documents based on given keys. A corresponding test function, RunFindDocInfosByKeysTest
, was also added to verify the functionality. These enhancements aim to improve the performance of the GetDocuments API by facilitating efficient bulk data queries.
File | Change Summary |
---|---|
server/backend/database/memory/database.go |
Added FindDocInfosByKeys method to retrieve documents based on given keys. |
server/backend/database/testcases/testcases.go |
Added RunFindDocInfosByKeysTest to test the FindDocInfosByKeys method by creating documents with specified keys and verifying the retrieval process. |
server/documents/documents.go |
Revised GetDocumentSummary and GetDocumentSummaries to use the new FindDocInfosByKeys method for improved bulk retrieval efficiency. |
server/rpc/admin_server.go |
Updated GetDocuments function to include a new parameter for the include_snapshot flag to enhance data retrieval options. |
api/yorkie/v1/admin.proto |
Added include_snapshot field to GetDocumentsRequest for optional snapshot inclusion in responses. |
Objective | Addressed | Explanation |
---|---|---|
Implement DB Query for GetDocuments API to improve performance (#921) | β |
In the realm of keys and docs they spin,
Where queries dance and tests begin,
Performance soared, the code refined,
InFindDocInfosByKeys
, success we find.
ππ
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Although the logic you mentioned is related to CRDT, it seems okay to understand it roughly and focus on query optimization.
Do you think it is possible to implement the bulk operation of BuildDocumentForServerSeq
as well, perhaps using $or
?
Although the logic you mentioned is related to CRDT, it seems okay to understand it roughly and focus on query optimization. Do you think it is possible to implement the bulk operation of
BuildDocumentForServerSeq
as well, perhaps using$or
?
@sejongk I'll give it a try. Is it okay if I ask questions if I encounter any issues during the implementation?
@sejongk I'll give it a try. Is it okay if I ask questions if I encounter any issues during the implementation?
Sure. If you have any suggestions about this, please let me know.
Although the logic you mentioned is related to CRDT, it seems okay to understand it roughly and focus on query optimization. Do you think it is possible to implement the bulk operation of BuildDocumentForServerSeq as well, perhaps using $or?
@sejongk I'll give it a try. Is it okay if I ask questions if I encounter any issues during the implementation?
@kokodak @sejongk
DocumentSummaries, which is in the response from GetDocuments API
, contains both time-related metadata
about Document and its content, snapshot
. Unlike metadata, retrieving snapshot
requires loading the document into memory, which can be relatively resource-intensive.
Document List Page in CodePair, which uses this API, only uses the time-related metadata
and not snapshot
.
https://www.figma.com/design/OYc1Cr0nvFuBnWZxhscfDk/Code-Pair?node-id=42-101&t=lCXENp1HuDnFAkwq-0
Therefore, how about adding an option(include snapshot
) in the API request to specify whether snapshot should be included.
Although the logic you mentioned is related to CRDT, it seems okay to understand it roughly and focus on query optimization. Do you think it is possible to implement the bulk operation of BuildDocumentForServerSeq as well, perhaps using $or? @sejongk I'll give it a try. Is it okay if I ask questions if I encounter any issues during the implementation?
@kokodak @sejongk
DocumentSummaries, which is in the response from
GetDocuments API
, contains bothtime-related metadata
about Document and its content,snapshot
. Unlike metadata, retrievingsnapshot
requires loading the document into memory, which can be relatively resource-intensive.Document List Page in CodePair, which uses this API, only uses the
time-related metadata
and notsnapshot
.https://www.figma.com/design/OYc1Cr0nvFuBnWZxhscfDk/Code-Pair?node-id=42-101&t=lCXENp1HuDnFAkwq-0
Therefore, how about adding an option(
include snapshot
) in the API request to specify whether snapshot should be included.
Thanks for your suggestion. I believe this suggested method is somewhat related to https://github.com/yorkie-team/yorkie/pull/597.
I have reviewed all the comments provided.
Currently, I have completed the implementation of bulk query methods for DB.FindClosestSnapshotInfo()
and DB.FindChangesBetweenServerSeqs()
, which are used in BuildDocumentForServerSeq()
.
However, I am facing some issues and need help with the following:
Although the bulk query operations are implemented, I am having difficulty writing test cases. Creating good test scenarios is challenging. Could I get some help with this?
I generally understand the context of @hackerwins comment, but I am a bit unclear about the exact meaning of "snapshot" since the term is used in several places in the code.
If the request value for include snapshot
is false
, does it mean that DB.FindClosestSnapshotInfo()
should be called with includeSnapshot
set to false
, or does it mean that packs.BuildDocumentForServerSeq()
should not be executed at all? (I am inclined to believe it's the latter.)
point 1
, to handle cases where include snapshot
is true
?I also agree that passing only the minimal information needed to render the screen is a good idea. However, if it turns out that snapshots will never be used in the GetDocuments API
, we might consider configuring the code to exclude snapshots without adding an option to the API request. What are your thoughts on this? Should we still include the option in the request for flexibility?
DocumentSummaries, which is in the response from
GetDocuments API
, contains bothtime-related metadata
about Document and its content,snapshot
. Unlike metadata, retrievingsnapshot
requires loading the document into memory, which can be relatively resource-intensive.Document List Page in CodePair, which uses this API, only uses the
time-related metadata
and notsnapshot
.https://www.figma.com/design/OYc1Cr0nvFuBnWZxhscfDk/Code-Pair?node-id=42-101&t=lCXENp1HuDnFAkwq-0
Therefore, how about adding an option(
include snapshot
) in the API request to specify whether snapshot should be included.
Based on the discussions with @hackerwins and @sejongk regarding the comment ideas above, we have decided to implement the option to include or exclude snapshots in the API request.
As a result, the GetDocuments API request specification has changed, which can be reviewed in this commit.
Consequently, by adding the include_snapshot
field with a value of false
in the CodePair code, we can expect performance improvements in the GetDocuments API.
What this PR does / why we need it:
This PR implements a bulk retrieval operation for the GetDocuments API to enhance performance.
The specific tasks accomplished include:
FindDocInfosByKeys()
method to the Database interfaceFindDocInfosByKeys()
logic inmongo.Client
andmemory.DB
respectivelyWhile the query to retrieve DocInfos has been reduced from N times to once when calling the GetDocuments API, there still remains an issue where
packs.BuildDocumentForServerSeq()
is called N times.However, this logic seems to be related to CRDT or logical clock functionalities, which I do not fully understand yet, so I could not work on it. Therefore, I did not remove the TODO comment regarding the N+1 issue.
Which issue(s) this PR fixes:
Fixes #921
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation