v6d-io / v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
https://v6d.io
Apache License 2.0
834 stars 122 forks source link

The graph's incremental update seems not support vertex property's update #1601

Open songqing opened 1 year ago

songqing commented 1 year ago

Describe your problem

1563 has supported graph data's incremental update, however, it seems vertex property can not update, for example,

the full data is: vid value 1 2.0 2 3.0 and the inc data is: vid value 1 4.0 After inc update, the vid 1's value is 2.0 not 4.0

So, can we support vertex property's update?

dashanji commented 1 year ago

Thanks @songqing.

Fixed in https://github.com/v6d-io/v6d/pull/1600/.

songqing commented 1 year ago

Thanks @songqing.

Fixed in #1600.

Sorry, there is a mistake, #1600 is another small fix and this issue is still unresolved

dashanji commented 1 year ago

Oh, my fault. I'm sorry for the noisy.

Reopned.

sighingnow commented 1 year ago

So, can we support vertex property's update?

Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.

For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.

songqing commented 1 year ago

So, can we support vertex property's update?

Technically we can, but we define vineyard's objects as immutable objects (to make concurrency control simpler). The incremental update APIs are designed for bulk data loading as well. We currently only support adding to make multi-versioned immutable objects simpler.

For scenarios like continuous incremental graph updating, I would like to suggest GART which is a graph store that supports streaming updates and more suitable for your cases like updating properties (via updating records in tables). GART is built upon vineyard as well.

OK, I see, thanks for your reply. There is a scenario, graph data is updated daily, for now, we can only load the full data every day, but if we support incremental update with modifying the existed data, we can load the full data at first, then load incremental data the next days, by this way, the data importing will be more efficient and cost less resources. And, there maybe only need small change based on the current incremental update's implementation, with GART, the query performance will be a little bad in this scenario.

sighingnow commented 1 year ago

It can be implemented by

As the first step, we could support only vertices or edges part.

sighingnow commented 1 year ago

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

songqing commented 1 year ago

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

SighingSnow commented 1 year ago

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.

So if this property is needed, you can revise the code above to update the table data.

@siyuan0322 could you please evaluate this issue

siyuan0322 commented 1 year ago

Yeah, seems it's a good fit here.

songqing commented 1 year ago

I may not have enough bandwidth on Vineyard in the next two months. Would you folks @songqing (or @SighingSnow) like to implement such features?

OK, thanks, it's not an urgent issue, I'll have a try later.

Hi, could you please check this code block https://github.com/v6d-io/v6d/blob/main/modules/graph/loader/basic_ev_fragment_loader_impl.h#L344~L406. The code block mentioned is to use the origin data. We check the incremental added vertices, and if there is a duplicate, we use the origin table data deliberately. Previously, expected user behaviors' are not to add duplicates, and if there is a duplicate, we will use the origin data.

So if this property is needed, you can revise the code above to update the table data.

@siyuan0322 could you please evaluate this issue

Yes, based on the current implementation, there only need small change to solve this issue. Besides the code you mentioned, https://github.com/v6d-io/v6d/blob/main/modules/graph/vertex_map/arrow_vertex_map_impl.h#L487~L500 may also need change.