the-data-lab / GraphOne

"GraphOne: A Data Store for Real-time Analytics on Evolving Graphs", Usenix FAST'19
GNU General Public License v3.0
58 stars 31 forks source link

Is GraphOne supposed to be thread safe? #10

Closed whatsthecraic closed 4 years ago

whatsthecraic commented 4 years ago

Hi @pradeep-k ,

as per title, is it supported performing updates concurrently from multiple threads?

pradeep-k commented 4 years ago

GraphOne is thread safe.
GraphOne can provided concurrent updates from multiple threads. There are two ways to do it.

  1. Just call the batch_edge() from multiple threads. It will create a global order of updates in the single edge log structure.

  2. There are other versions of batch_edge() that allows batching per thread before sending to the edge log.

Finally, updates work concurrently while analytics is running. Let me know, if you would like to know how to achieve that.

whatsthecraic commented 4 years ago

Hi @pradeep-k , in the experiments from the paper, does the driver execute updates from multiple threads? Or it performs all insertions/deletions from a single thread?

pradeep-k commented 4 years ago

FAST'19 version only updates from a single thread, but ACM TOS paper does insertion from multiple threads.

whatsthecraic commented 4 years ago

The TOS paper has already been published? Could you send me a copy? (dleo at cwi dot nl) In the FAST paper, was the set up of the alternative libraries/databases also sequential?

pradeep-k commented 4 years ago

File is avaialble at https://pradeep-k.github.io/files/GraphOne.pdf.

I did not understand this question? "was the set up of the alternative libraries/databases also sequential?"

whatsthecraic commented 4 years ago

Hi, thanks for the paper! I wonder whether also in the other cases the driver was sequential, performing all updates with a single thread, or with multiple threads/connections?

pradeep-k commented 4 years ago

Majority of cases, ingestions are done using a signle thread, unless otherwise mentioned. Following text reproduced from the ACM TOS version will help you understand why we choose this:

Table 9 shows that parsing the raw text takes most of time for Kron-28 graph. For LANL graph,
the parsing and interacting with the hashmap takes most of the time. The numbers in Table 9
is for single threaded logging phase. We can confirm that they do show sub-linear scaling with
increasing thread count as the hashmap can support parallel ingestion and retrieval. Specifically,
with 8 threads for 8 LANL text files shows logging rate close to 5.55 million edges per second, a
speedup of 4.44×. The sub-linear scaling is due usage of the atomic instructions while claiming the
spot in the edge log.
However, we feel that it is not the ideal setup for parallel ingestions in the real-world. For example,
the actual data sources for GraphOne may not necessarily be the files, rather they could be the
database log changes, Kafka etc. In case of Kafka, data is already sharded using shared-nothing
partitioning technique, therefore, we believe logging data from different shards to the same edge
log to create a global order is not worth, and rather separate edge logs would be best way to go
forward. We confirm that batching in independent edge logs show linear scaling as per our testing.