Improve query scan at storage level

lvca commented 9 years ago

We need to move query scan at lower level. The main problem is that DB layer and STORAGE layer were born decoupled. Without changing too much we could provide a new API at storage level, like:

void scanCluster( String clusterName, Callable iCallback );

So in case of scan, the DB layer can pass a callback where it can invoke the right hooks (only if needed, thanks to #5080) and in case of query, executing the SQL conditions. I hope this will improve the scan speed of 2x because:

DB layer doesn't call X times the STORAGE layer. In case of 1M records, the DB layer calls STORAGE layer 1M times by acquiring locks and with checks we could avoid
The entire scan operation would take one READ lock at the beginning
Now we have binary queries (#5034), we could support a zero copy compare, directly at page level. The contract of this API is that it can't touch any page content but it must be 100% read-only

andrii0lomakin commented 9 years ago

There is downside of this issue, because we use write lock for transaction, all data modification operations for graph database may be delayed for considerable time in production, sufficiently decreasing database write speed.

lvca commented 9 years ago

What case you execute a scan in the middle of a tx lock?

andrii0lomakin commented 9 years ago

Sure, it is not case of executing scan in middle of tx lock. Full scan is executed by other threads , and not only full scan. What is typical distribution of read and writes , 80% vs 20% right ? So let suppose we have 8 cores CPU and we have 8 threads accordingly. Then it means that 6 cores will perform reads and 3 threads writes. Because in graph database writes are done in serialized way then we have 6 reading threads and one writing thread which waits till reading threads will complete their operation, of course if at least one reading thread will do query, it means that writing will be downgraded, I do not take in account full scan, merely sql query performance, how much sql query took to execute about, ok let suppose that we fetch in average 10 records for each query and we execute queries not in parallel but one by one that is very good situation for data modification operation and usually situation is much worse. Anyway let's suppose that each record is fetched for 0.01 ms then it means that query will take 0.1 ms or if we modify record in writing thread it means that speed of writing thread is 0.1 (waiting time) + 0.01ms (write time) = 0.11ms or 9000 op/s downgrade of speed instead of 0.01 + 0.01 = 0.02 or 50 000 op/s . So slowdown is 50 000 / 9 000 = 5.5 times. I think case when users issue queries in parallel to data writes is very typical so I will write benchmark to measure is there slowdown in production case when reads and writes are mixed and will publish speed up of read queries and slowdown of write queries after this changes of course if such slowdown will exist.

lvca commented 9 years ago

Andrey, In case writes happen during scans everything will be slower, but this is true also for the past architecture, where the scan called X times the storage to read records one per time. Each read involve 5 locks (stateLock, lockManager, dataLock, atomicOperationsManager and cluster), so it's 5 * X locks/unlocks. Now you have just 4 locks/unlocks for the entire scan: no RIDs are locked, because no dirty records are possible here (updateRecord acquires an exclusive lock on the cluster).

So now the lock could be longer in case of scan, writes could wait more time, but IMHO this is acceptable. We could also support batched scan, where we get a lock every Y scanned records in order to avoid this problem. WDYT?

andrii0lomakin commented 9 years ago

Luca, According to you information query speed is 4 seconds, according to Luigi average query speed now is 100 ms it means that in production we degradate write performance from 20 000 rec/s to 100 rec/s. Let suppose that someone wrote benchmark for the mix of write and sql queries and published that OrientDB has 100 rec/s write speed. I think that will not be good for anyone. There is simpler example in production some of our customers do not use indexes, of course they will notice such write speed downgrade. And of course if every write will wait for 4 seconds to complete 0.25 op/s that will be something.

So I think that we definitely need batched scan, even if developers will not test performance of production we should care about our customers ourselves and prevent them to have problems in production.

lvca commented 9 years ago

+1 for batched scan, I can take care of it in the next 48h

lvca commented 9 years ago

Fixed, missing unit tests.

Added this new setting:

 QUERY_SCAN_BATCH_SIZE("query.scanBatchSize",
      "Scan clusters at blocks of records. This avoids to keep the lock on storage and cluster for too much time. High values means faster execution, but also a lower concurrency level. Set 0 to disable batch scan. This is suggested for readonly databases",
      Long.class, 100000),

smolinari commented 9 years ago

"Scan clusters in blocks of records. This setting reduces the lock time on the cluster during scans. A high value mean a faster execution, but also a lower concurrency level. Set to 0 to disable batch scanning. Disabling batch scanning is suggested for read-only databases only."

Scott

lvca commented 9 years ago

Thanks, changed. @smolinari Feel free to review my bad english the entire OGlobalConfiguration class with a PR ;-)

smolinari commented 9 years ago

You asked for it! LOL! :smile:

https://github.com/orientechnologies/orientdb/pull/5142

Scott

lvca commented 9 years ago

You rock man, thank you.

orientechnologies / orientdb

Improve query scan at storage level #5081