nick-ak96 / 3storage

Here we implement and discuss 3storage, Erlang storage of triples.
1 stars 0 forks source link

Architecture of 3store #2

Open iztok-savnik opened 5 years ago

iztok-savnik commented 5 years ago

Here we discuss the architecture of 3store... if you have a suggestion for a different name let me know...

iztok-savnik commented 5 years ago

Here are the initial ideas of the triple storage manager.

File of pages

ISAM tree

Server

iztok-savnik commented 5 years ago

There are many details that will have to be carefully designed.

  1. Since we will deal with big data we need cursor based access. Experiences with mnesia show you can not use get-all-answers type of access methods.
  2. We have to support server-like behavior. Multiple processes can access multiple triple tables. (as you see, the terminology has to be defined first to be able to see exactly what we are talking about).
  3. At some point, sessions with 3store will have to be defined to support cursors.
  4. At this point, I would forget about page caches. This will come at the "distributed epsilon" level.
  5. Since a lot of data should flow through the storage manager, copying in memory and long lists of function parameters should be avoided.
  6. Triple-pattern access methods can be implemented directly on the 3store in the first phase. In the second phases "distributed epsilon" layer is added.
iztok-savnik commented 5 years ago

I refined the text on architecture.


Page-level file server

ISAM index

TP access method

iztok-savnik commented 5 years ago

Some more comments on the proposed architecture.


nick-ak96 commented 5 years ago

@iztok-savnik I suggest that along with the issues we will start a wiki in this repo (as I see it is disabled currently), where we will put the final results of our discussion here in issues. That will of course, later on, serve as a documentation for the software. Also, it will help us to not get lost in long discussions and to make a cleaner implementation.


Currently, I see you defined 3 levels for the storage manager:

The last one I think is already a higher level. We must take it into account in order to design lower levels, but first I think we should focus more on designing the lower level architecture.

Page level

The page-level does not know anything about the contents of pages.

What happens during a record insertion? Should we fetch an appropriate page for insertion from index, modify it in memory and then just blindly override? I probably do not understand completely what you meant here.

You mentioned read/write operations with chunks. I think It can be implemented as an intermediate module between page level and access level, that will also have caching (or we can add caching later). In literature, it is called a buffer manager.

ISAM index

I agree with your choice of index structure at this point. But, do you think this index structure will be sufficient for 3store in general? I mean it is fine if the data is more or less static, but if it becomes dynamic the B+-tree will be a better option. We should probably think of supporting both of them, but that should be in some future phase.


I suggest that we start with some definitions, lists of processes and methods that page-level should support.

I will start here. But I think we should put them into wiki, and discuss here, if needed.

Processes

  1. Management of requests. The requests come from the buffer manager. Requests are accepted and put into request queue. A pool of workers process requests in parallel from the request queue.
  2. Write request. The request is a transaction.
  3. Read request. If any write requests are executing, the request is paused.

Methods

  1. Read pages
  2. Write pages
iztok-savnik commented 5 years ago

I suggest that along with the issues we will start a wiki in this repo (as I see it is disabled currently), where we will put the final results of our discussion here in issues. That will of course, later on, serve as a documentation for the software. Also, it will help us to not get lost in long discussions and to make a cleaner implementation.

Totaly agree.

Currently, I see you defined 3 levels for the storage manager: page level indexes triple-pattern access methods

Yes.

The last one I think is already a higher level. We must take it into account in order to design lower levels, but first I think we should focus more on designing the lower level architecture.

Totally agree.

What happens during a record insertion? Should we fetch an appropriate page for insertion from index, modify it in memory and then just blindly override? I probably do not understand completely what you meant here.

Right, TP-AM would really be triple-level storage manager. Then insertion operation would be a part of triple-level storage manager (TSM). Therefore, I suggest that the 3rd level is TSM.

There are details on how the messages will flow. I will present the triple-pattern query node from the side of the query evaluation sub-system in more detail.

Current interface of the tp_query_node.erl includes the implementation of a query node: it accesses the database (it should be general and accept various interfaces to storage managers or dbmss) and returns the result to the parent query node via streams. Documentation is https://big3store.github.io/big3store/ (tp_query_node).

You mentioned read/write operations with chunks. I think It can be implemented as an intermediate module between page level and access level, that will also have caching (or we can add caching later). In literature, it is called a buffer manager.

At the level of query evaluation (implemented with query nodes (processes) linked in a tree) we already have "chunks" that stand for the max number of pages (this is again the concept from the query evaluation in particular the unit of network transfer in our query evaluation system) transfered for one tp_query_node request to dbms before giving the chance to their requests. Something like this... I will go in some more details here later.

Obviously, the concepts defined on the level of query evaluation will have to be well synchronised with the storage manager that we design.

Caching can be implemented without much effort in the page-level storage manager. ??

I agree with your choice of index structure at this point. But, do you think this index structure will be sufficient for 3store in general? I mean it is fine if the data is more or less static, but if it becomes dynamic the B+-tree will be a better option. We should probably think of supporting both of them, but that should be in some future phase.

ISAM implementation can be seen as a part of B+-tree implementation. Therefore, initially we have only bulk-loading and efficient retrieval, but later we can add the balance and insertions, if needed.

One of the ideas of the architecture is to implement initially as simple as possible storage manager, but we should see the complete architecture together with the links to tp_query_node and b3s implementation of streams.

iztok-savnik commented 5 years ago

I tried to enable Wiki but I found the following " Upgrade to GitHub Pro or make this repository public to enable Wikis.". I do not know much about github ... should we make it public?

nick-ak96 commented 5 years ago

Well, if there are no problems regarding license agreements, I think we can make it public. Do we need to check this with Kiyoshi?

iztok-savnik commented 5 years ago

As you have proposed we can now design an Erlang process for Page-level storage manager... Can you specify the basic functionality?

nick-ak96 commented 5 years ago

As I see it, the general picture of page level should look somehow like that.

Pages are organized into page files. Since pages are of fixed size, page files will also hold a fixed (limited) amount of pages. That will allow us to search for pages in page files using the offset (page ID). The server should be able to handle multiple page files which will allow us to distribute data into several disks. All page files can be maintained by a pool of workers that is managed by a master worker. Each worker manages a portion of page files. A worker accepts requests to read from and write to pages. Master worker will handle the requests between buffer level and workers. Master worker also will manage transactions. In this way we will be able to do faster transactions.


If the above text seems reasonable, then we can discuss the following questions:

iztok-savnik commented 5 years ago

I had a bit different ideas about the SM.

Page-level works with one data file! If multiple TP-AMs need access to a particular file then they should work with one (the same) page-level process.

I see that you still have a "sequential view": page-level storage manager is a module that can be spawned with given parameters by some other module (there are no dispatchers on the page-level).


Pages are organized into page files. Since pages are of fixed size, page files will also hold a fixed (limited) amount of pages. That will allow us to search for pages in page files using the offset (page ID). The server should be able to handle multiple page files which will allow us to distribute data into several disks.

One process is created for one data file (composed of pages). I will check also this once again...

All page files can be maintained by a pool of workers that is managed by a master worker. Each worker manages a portion of page files. A worker accepts requests to read from and write to pages. Master worker will handle the requests between buffer level and workers. Master worker also will manage transactions. In this way, we will be able to do faster transactions.

My initial ideas were that one page-level process handles one file. We should also have stored the mapping from filenames to existing processes. Do we need a master worker?

Since page level does not know about the content of the pages, does it mean that at this level we read and write the whole page, i.e. cannot edit a single record on a page? ISAM tree is basically a collection of pages, so I think we need the ability to write records into pages as well.

On the ISAM level (below is page-level file) we have pages including pairs <key,pointer>. We need to be able to handle the pairs <key,pointer>.

How do we manage page files? I assume that we want to handle multiple page files at a time for distributive purposes. I would then create some header structure for such file, that describes the content: empty pages, offset from other files.

This is a good point. We would need to handle multiple disks. Let's re-think again the page-level process design from this point of view. I think the existing design should work... header may be a good idea.

Since page size is fixed, do we also fix the size of records or are they variable length?

Data records are handled at TP-access.method level. ISAM records (pairs) should probably have fixed length for a given key (currently all components of triples are integer-encoded, we may re-trink this).


I will do one more integration of the ideas from my side and try to produce a more stable text about page-level and others by the evening.

I will also upload the papers on storage managers ...

nick-ak96 commented 5 years ago

One process is created for one data file

All idea about workers came to me because I was thinking that one process in Erlang is a system process. So, my intention was that since creation and handling of processes in the system has some penalties, one page file per process would not be enough.

I now realized that Erlang process is a process inside Erlang VM and they are much lighter and cheaper than system processes.

So, I agree, one process per page file should be OK.

We should also have stored the mapping from filenames to existing processes. Do we need a master worker?

We could just register processes with the same names as files. No need to store this data. We do need to keep track of all the created processes. That is one of the reasons why I thought about the master worker. Also, how would you handle transactions? If you need to write data to several page files, both processes that handle these files must be aware that they must hold locks at the same time. I might be wrong here, I will have a look at transactions again.

iztok-savnik commented 5 years ago

I now realized that Erlang process is a process inside Erlang VM and they are much lighter and cheaper than system processes.

Yes, they write somewhere that they can spawn more than 1M processes in a second. Also, communication is very fast...

We could just register processes with the same names as files. No need to store this data. We do need to keep track of all the created processes. That is one of the reasons why I thought about the master worker.

This is a good solution. You can query if the process exists and then access through a name.

I was about to suggest to use ETS tables. One possible problem that bothered me is the critical section created with the table. However, careful access rules could provide safe access.

Similarly can be done with the processes. You ask if there exists a process A. It does not exists so you create it. But somebody else has just created A, so you should get an error trying to create A. After A is created in whatever way I would again obtain a pid for a static process name.

Also, how would you handle transactions? If you need to write data to several page files, both processes that handle these files must be aware that they must hold locks at the same time. I might be wrong here, I will have a look at transactions again.

I would skip transactions for the Master's thesis. To my opinion, this could be very complex. Still, it makes sense to think about the transactions during the design so that they can be added more easily later.


One more thing that is bothering me is the additional data transfer between file-level AM and index-level AM. Probably it would be better to design the first version without considering much the cost of data transfer (from one process to another). Latter we can try to optimize this.

iztok-savnik commented 5 years ago

Here is the new sketch of the Page file server and ISAM index


.

Page file server

ISAM index