Performance of select query using index is extremely low

mrft commented 10 years ago

create property V.name string
create index vname on V( name ) notunique

select from V where name = 'HEY BO DIDDLEY'

=> Query executed in 0.069 sec. Returned 1 record(s)

explain select from V where name = 'HEY BO DIDDLEY'

returns

             "limit": -1,
            "compositeIndexUsed": 1,
            "involvedIndexes": [
                "vname"
            ],
            "fullySortedByIndex": false,
            "indexIsUsedInOrderBy": false,
            "current": "#9:1",
            "evaluated": 1,
            "fetchingFromTargetElapsed": 0,
            "documentReads": 1,
            "user": "#5:0",
            "documentAnalyzedCompatibleClass": 1,
            "recordReads": 1,
            "elapsed": 1.978317,
            "resultType": "collection",
            "resultSize": 1,
            "@fieldTypes": "compositeIndexUsed=l,involvedIndexes=e,evaluated=l,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"
        }
    ],
    "notification": "Query executed in 0.068 sec. Returned 1 record(s)"

which indicates the index is being used.

69 milliseconds, for fetching 1 single record, you could beat that performnce by a serial search. This is a severe contrast with OrientDB boasting about its speed on the website...

Oh, and I am running OrientDB locally (so no network latency), and ran this query through the studio (but that doesn't really matter). Version 1.7.9 or 2.0-M2 doesn't matter also

lvca commented 10 years ago

Is this a benchmark? Have you tried with 1 or 100 records?

andrii0lomakin commented 10 years ago

How many records were into the index ? Are you able to provide data of benchmark ?

andrii0lomakin commented 10 years ago

Did you make cache hot before benchmark ?

mrft commented 10 years ago

It is just the GratefulDeadConcerts database.

Doing the same query a few times from the console seems faster: Query executed in 0.005 sec(s).

But still 5 milliseconds for 1 single record is extremely slow in my opinion, don't you think?

Here is some tests using a node script that calls the restful API and uses Oriento:

1000 X query [select from V where name = 'HEY BO DIDDLEY'] through REST took 8354 milliseconds. That's 119.70313622216902 per second, and 8.354 ms per event.
Start 1000 queries with BINARY protocol and query = select from V where name = 'HEY BO DIDDLEY'
1000 X query [select from V where name = 'HEY BO DIDDLEY'] through BINARY took 3222 milliseconds. That's 310.36623215394167 per second, and 3.222 ms per event.

lvca commented 10 years ago

Can you try to get 1 and 50 records? I bet the time is pretty close. If you get 1 record you've a fixed cost in time.

mrft commented 10 years ago

WONTFIX + INVALID ?

Fixed cost in time? Then can you please explain these results to me? Or tell me what I am doing wrong?

Some more results from SNAPSHOT-2.0 from last tuesday I think...

1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through REST took 205901 milliseconds. That's 4.856702978615937 per second, and 205.901 ms per event.
1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through BINARY took 216148 milliseconds. That's 4.62645964801895 per second, and 216.148 ms per event.

1000 X query [select from V where name in  [ 'HEY BO DIDDLEY','IM A MAN','NOT FADE AWAY','BERTHA','GOING DOWN THE ROAD FEELING BAD','MONA','Bo_Diddley','Garcia','Spencer_Davis' ]] through REST took 51524 milliseconds. That's 19.408431022436147 per second, and 51.524 ms per event.

And can it get any more basic than this?

 
1000 X query [select from V limit 50] through REST took 191517 milliseconds. That's 5.221468590255696 per second, and 191.517 ms per event.

lvca commented 10 years ago

@mrft I closed the issue because you didn't respond at my first question about executing the query with 1 and 50 returning items.

lvca commented 10 years ago

@mrft In your last comment you don't write how you tested REST and Binary protocols. Can you execute just 2 queries through console or CURL that retrieve 1 and then 50 items?

mrft commented 10 years ago

Using console

connect remote:localhost/GratefulDeadConcerts admin admin

select from V limit 50

(I did it > 5 times to let the memory caching kick in):

50 item(s) found. Query executed in 0.124 sec(s).

lvca commented 10 years ago

Ok, so it's 0,00248 per record.

About the index, have you used a UNIQUE_HASH_INDEX?

mrft commented 10 years ago

NOTUNIQUE as you can see in my first post (since not all V in GratefulDeadConcerts have a name I think so you have multiple NULL values...)

2.48 ms per record, sounds ok if you put it like that, but it should be 2.48 ms for the 50 small records (or maybe 5ms or even 10ms but not 124ms). Try a similar query on MySQL on Postgres, and you'll know what I mean.

Do you knwo what is causing this? ToJSON couldn't take that long...

andrii0lomakin commented 10 years ago

Guys it is not even 2 ms per record we can get about 5 000 records per second which is 0.2 ms per record. I will try to provide Java based benchmark at first, I suppose no one use console as interface to interact with application. And then we will discuss it's results.

andrii0lomakin commented 10 years ago

Ok so for one test I got 100 rec/ms . 0.01 ms for record. I will check it on this database.

mrft commented 10 years ago

Some more tests, this time I wrote a very simple java application, and started fetching all the records

    public static void main( String[ ] args) {
        for ( int i = 0; i < 50; i++ ) {
            testDocument_GratefulDeadConcerts();
        }       
    }
    public static void testDocument_GratefulDeadConcerts() {
        ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/GratefulDeadConcerts").open( "admin", "admin" );
        String className = "V";
        int clid = db.getClusterIdByName( "V" );
        System.out.println( "clusterid = " + clid );

        int nrOfRecords = 0;
        long t = System.nanoTime();
        long totalFetchTime = 0;
        long totalToJSONTime = 0;
        try {
            for ( ODocument record : db.browseCluster( className ) ) {
                nrOfRecords++;

                long currentNanoTime = System.nanoTime();
                totalFetchTime += currentNanoTime - t;

                t = System.nanoTime();
                String json = record.toJSON();
                currentNanoTime = System.nanoTime();
                totalToJSONTime += currentNanoTime - t;

                t = System.nanoTime();
            }
            double totalFetchTimeInMilliseconds = ((double)totalFetchTime / 1000000);
            double totalToJSONTimeInMilliseconds = ((double)totalToJSONTime / 1000000);
            System.out.println( "Fetching and toJSON() on " + nrOfRecords + " records took " + totalFetchTimeInMilliseconds + " milliseconds and " + totalToJSONTimeInMilliseconds + " milliseconds respectively." );
            System.out.println( "That is " + (totalFetchTimeInMilliseconds / nrOfRecords) + " milliseconds and " + ( totalToJSONTimeInMilliseconds / nrOfRecords ) + " milliseconds per record" +
                                "\n" +
                                " and " + ( 1000.0 / (totalFetchTimeInMilliseconds / nrOfRecords) ) + " records per second" +
                                " and " + (1000.0 / (totalToJSONTimeInMilliseconds / nrOfRecords)) + " records per second respectively." );
            db.rollback();
        }
        finally {
            System.out.println( "Number of records = " + nrOfRecords );
            db.close();
        }
    }
    public static long printDurationToSystemOut(long startSystemNanoTime, long currentSystemNanoTime, String description) {
        long durationInNanoSeconds = currentSystemNanoTime - startSystemNanoTime;
        double durationInMilliseconds = (double)durationInNanoSeconds / 1000000;
        System.out.println(description + " took " + durationInMilliseconds + " ms.");
        return durationInNanoSeconds;
    }

gives me the following disappointing final results on the first run

Fetching and toJSON() on 809 records took 544.10086 milliseconds and 1318.839576 milliseconds respectively.
That is 0.6725597775030903 milliseconds and 1.630209611866502 milliseconds per record 
    and 1486.8566831524581 records per second and 613.4180492624222 records per second respectively.

which is slow, an I don't understand because the server keeps running so I would presume the 'cache' should still be hot (or is it completely thrown away on each 'session'?) and could that be the reason why everything on the Restful API stays so slow, because it is like a whole new 'cold' session each time?

On the last run, I get closer to the 5 000 reords per second that laa is talking about:

Fetching and toJSON() on 809 records took 143.26815 milliseconds and 175.4015 milliseconds respectively.
That is 0.17709289245982693 milliseconds and 0.21681273176761434 milliseconds per record
 and 5646.754006385928 records per second and 4612.275265604912 records per second respectively.

but mind you that I am not even doing a query, just selecting ALL records in the database. Also toJSON() seems very slow, if you see that it can only stringify < 5000 records per second, and considering stringifying a larger object in nodejs can do 175438 javascript objects per second. I can imagine it will be slower, but 35 times slower seems like way too much, so I guess some performance improvements could and should be made there.

And the following are just some simple JSON parsing and stringifying tests on nodejs, just to get an idea about what these figures mean...

Parsing 10000 JSON objects took  244 milliseconds. That's 40983.60655737705 per second, and 0.0244 ms per event.
Stringifying 10000 JSON objects took  57 milliseconds. That's 175438.59649122806 per second, and 0.0057 ms per event.

phpnode commented 10 years ago

@mrft curious about this because Oriento has some benchmarks as part of its test suite and I've never seen such low numbers, I wonder if this is because we're using RECORD_LOAD and thus avoid all the SQL overhead. Would you mind running the Oriento tests and pasting the results of this one in particular: https://github.com/codemix/oriento/blob/master/test/bugs/27-slow.js ?

andrii0lomakin commented 10 years ago

@mrft you load all records by network on client side to do query and claim that this is slow ? Simple does not mean efficient.

All queries are done on server side not on client side. Also System.nanoTime has:

granularity which you did not take in account when you wrote this benchmark.
System.nanoTime is slow itself it took about 100 CPU cycles to execute it.

https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks

What I propose. I will write benchmark and you will review it. Then you will run benchmark. And then we will decide whether this issue should be fixed or not.

BTW Which H/W did you use to run this your test ?

mrft commented 10 years ago

@phpnode: I guess these are the results you are looking for:

  Bug #27: Slow compared to Restful API
Binary Protocol Took  556ms, 8993 documents per second
    ✓ should load a lot of records quickly, using the binary raw command interface (557ms)
Rest Protocol Took  593ms, 8432 documents per second
    ✓ should load a lot of records quickly, using the rest raw command interface (593ms)
Binary DB Api Took  292ms, 17123 documents per second
    ✓ should load a lot of records quickly (292ms)
Binary Record Load Took  985ms, 5076 documents per second
    ✓ should load a lot of records, one at a time, using binary (986ms)
Rest Record Load Took  7575ms, 660 documents per second
    ✓ should load a lot of records, one at a time, using rest (7575ms)

@laa:

Yes, all records, which means it should be able to start sending results right away since there cannot be any "I have to do difficult logic to find out which records you mean".
I am talking against my local machine (core i5 running on Ubuntu 14.04 with 8MB of RAM and an SSD), but even when assuming a 1000Mb/second link we should be able to send more than 100MB/second, so that shouldn't be an issue either.
I am measuring the time it takes to fetch 1 single record (and then adding all these times to get an average, so I am not measuring the time of the nanoTime call or any other code) after the query has been executed, and the time it takes to JSON.stringify it. I didn't measure how long it took to run the query. If I compare this to doing a select * from table limit 1000 on a local postgres database I get the following results (using Squirrel)
```
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.088, SQL query: 0.005, Reading results: 0.083
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.067, SQL query: 0.009, Reading results: 0.058
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.041, SQL query: 0.004, Reading results: 0.037
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.032, SQL query: 0.005, Reading results: 0.027
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.028, SQL query: 0.003, Reading results: 0.025
```
and if I compare this to 0.544 or later on 0.143 for fetching about 800 records from OrientDB, I can only conclude that it is way slower. About 5 times worse, that is very signifcant. All of this was done on the same laptop, with no phyical network between client and server.
When I disconnect from Postgres and I reconnect, the first query I run immediately gives me results like below, and I don't have to wait again for the cache to warm up
```
Rows read: 1,000, Elapsed time (seconds) - Total: 0.03, SQL query: 0.004, Reading results: 0.026
```
Since I was trying to measure how long it took to execute something, maybe you should take a look at what I quoted below from https://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime%28%29
```
For example, to measure how long some code takes to execute:
long startTime = System.nanoTime();
// ... the code being measured ...
long estimatedTime = System.nanoTime() - startTime;
```
Oh, and a 2 GHz CPU (two gigahertz) goes through 2,000,000,000 cycles every second, and you are worrying about 100 CPU cycles disrupting my measurements?

What more can I say? It feels like you don't want to believe these numbers, because you don't like what I am saying.

You can try to write a benchmark that will prove that my numbers are wrong, or you could use your time wisely and make the restful API, ànd the console, ànd the Java API all work faster, and check if you are on par with other modern day database systems.

andrii0lomakin commented 10 years ago

@mrft I have doubts about benchmarks because you are not only one user who compares DBs performance. Could you wait a bit I finish important issue today and we try to discover what is going on.

phpnode commented 10 years ago

@mrft thanks, so the Oriento results don't reflect what you're seeing in java. What kind of perf do you see if you do the following in java (pseudo-code)

// prep
var num = 10000;
for (i = 0; i < num; i++) {
   db.someCluster.insert({foo: i});
}

// bench
var startTime = System.nanoTime();
for (i = 0; i < num; i++) {
  doc = db.record.load('#10:' + i);
}
var stopTime = System.nanoTime();

console.log(stopTime - startTime);

mrft commented 10 years ago

@phpnode

This is my code, so please correct me if you meant something else (or if it can be done more efficiently)

        ODatabaseDocumentTx db = new ODatabaseDocumentTx( "remote:localhost/GratefulDeadConcerts" ).open( "admin", "admin" );
        try {
            // prep
            int num = 10000;
            long startTime = System.nanoTime();
            ODocument doc = null;
            for ( int i = 0; i < num; i++ ) {
                 doc = new ODocument( "Test" );
                doc.field( "foo", i );
                // SAVE THE DOCUMENT
//              StringBuilder sb = new StringBuilder();
                doc.save();
//              .getIdentity().toString( sb );
//              System.out.println( sb.toString() );
            }

            long stopTime = System.nanoTime();
            Utils.printDurationToSystemOut( startTime, stopTime, num, "Creating and saving document " );

            // bench
            String clusterId = Integer.toString( doc.getIdentity().getClusterId() );
            startTime = System.nanoTime();
            for ( int i = 0; i < num; i++ ) {
                Object loadedDoc = db.command( new OSQLSynchQuery( "select from " + "#" + clusterId + ":" + i ) ).execute();
            }
            stopTime = System.nanoTime();
            long nrOfNanoSecondsPassed = stopTime - startTime;
            double nrOfMilliSecondsPassed = (double)nrOfNanoSecondsPassed / 1000000;

            Utils.printDurationToSystemOut( startTime, stopTime, num, "Getting document " );
            System.out.println( nrOfNanoSecondsPassed );
        }
        finally {
            db.rollback();
            db.close();
        }

Will return:

Creating and saving document  10000 times took 5536.996626 ms. That's 1806.0332478880582 per second, and 0.5536996626 ms per event.
Getting document  10000 times took 3319.131759 ms. That's 3012.8361047688077 per second, and 0.33191317589999997 ms per event.
3319131759

One other question: you say Oriento results don't reflect what I am seeing in Java, but the only test returning a reasonable number of records per second seems to be 'Binary DB Api Took 292ms, 17123 documents per second'. I don't really know what the difference between all the tests is, and what exactly you are doing (I could check your code, but I haven't yet :) ).

kowalot commented 10 years ago

@mrft I recreated your bench using .NET client connected to remote localhost OrientDB.

using (var db = CreateTestDb(MethodInfo.GetCurrentMethod()))
            {
                db.RegisterAllTypes();

                var num = 10000;
                var sw = new Stopwatch();
                sw.Start();
                Actor actor = null;
                for (int i = 0; i < num; i++)
                {
                    actor = new Actor() {  Age = i };
                    actor.Save();
                }
                sw.Stop();
                Debug.WriteLine("Creating and saving document {0} times took {1} ms. That's {2} per second, and {3} ms per event.", num, sw.ElapsedMilliseconds, (num * 1000) / (sw.ElapsedMilliseconds), (double)sw.ElapsedMilliseconds / num);
                sw.Restart();
                for (int i = 0; i < num; i++)
                {
                    var act = db.LoadRecord<Actor>(new ORID(actor.RID.ClusterId, i));
                    Assert.IsTrue(act != null);
                }
                sw.Stop();
                Debug.WriteLine("Getting document  {0} times took {1} ms. That's {2} per second, and {3} ms per event.", num, sw.ElapsedMilliseconds, (num * 1000) / (sw.ElapsedMilliseconds), (double)sw.ElapsedMilliseconds / num);

            }

Results on my 6-core home pc are following

Creating and saving document 10000 times took 1762 ms. That's 5675 per second, and 0.1762 ms per event.
Getting document  10000 times took 1961 ms. That's 5099 per second, and 0.1961 ms per event.

My two cents to the discussion:

for me numbers are quite ok considering that test is done in one thread using remote connection.
It doesn't saturate even one core during a test
serious stress test should use many client threads, because it's typical scenario for databases. It can be done also for batch insert stress test. In such scenario I can easily reach 20 000 inserts per second on my home computer (remote protocol(!)) and I have reached on decent desktop PC up to 40 000 per second not mentioning about real servers . This is the scenario where MongoDB would have serious problems, which I know from my experience.

phpnode commented 10 years ago

@mrft in the fastest example, it is simply loading 5000 records in one go via a single select query

andrii0lomakin commented 10 years ago

Ok. Here my test.

@Test
public class IndexTest {
  public volatile long c = 43;

  public void testIndexQueryEmbedded() throws Exception {
    ODatabaseDocumentTx databaseDocumentTx = new ODatabaseDocumentTx("plocal:/home/andrey/Development/releases/orientdb-community-2.0-SNAPSHOT/databases/GratefulDeadConcerts");
    databaseDocumentTx.open("admin", "admin");

    int counter = 0;

    // make JIT do it's job
    for (int i = 0; i < 1000000; i++) {
      List<ODocument> result = databaseDocumentTx
          .query(new OSQLSynchQuery<ODocument>("select from V where name = 'HEY BO DIDDLEY'"));
      counter += result.size();
      databaseDocumentTx.getLocalCache().clear();
    }

    long st = System.currentTimeMillis();
    // make pause, consume CPU, we need to have volatile here to do not circuit cycle
    for (int k = 0; k < 500; k++) {
      long c1 = c;
      for (int i = 0; i < 100000000; i++) {
        c1 += c1 * 31 + i * 51;
      }
      c = c1;
    }

    System.out.println("Pause : " + (System.currentTimeMillis() - st + " ms."));

    // benchmark
    long start = System.nanoTime();
    for (int i = 0; i < 100; i++) {

      List<ODocument> result = databaseDocumentTx
          .query(new OSQLSynchQuery<ODocument>("select from V where name = 'HEY BO DIDDLEY'"));
      counter += result.size();
      databaseDocumentTx.getLocalCache().clear();
    }
    long end = System.nanoTime();

    System.out.println("Time : " + ((end - start) / (100)) + " ns. counter : " + counter);
  }

}

for embedded storage I have:

Pause : 68921 ms. Time : 29142 ns. ( 0.03 ms for query)

For remote:

Time : 97171 ns (0.1 ms for query)

Do you agree with this test ?

andrii0lomakin commented 10 years ago

Ok the same for record loading.

Remote: 103910 ns. Embeded: Time : 9721 ns

As for me we should improve remote protocol performance but are those number super slow ?

mrft commented 10 years ago

Hi,

did some more tests using your code. Seems better but I have some questions:

What exactly are these things doing? // make JIT do it's job // make pause, consume CPU, we need to have volatile here to do not circuit cycle

and why are they needed to get a decent speed?

It would be nice to have a good speed from the start (or after just a few queries) instead of having to do funny things like running the same query a gazillion times first, and then saturating the CPU for a minute. My results using oriento seem better now too.

So I am now down to

Running select from V where name = 'HEY BO DIDDLEY' 
1000 times took 792.975994 ms. That's 1261.0722235810836 per second, and 0.792975994 ms per event.

Also tried some other queries, and one thing that struck me (and I compare with similar Postgres queries just to have a reference point) is the following: when the number of results goes up, the query time seems to go up linearly in orientdb, whereas it looks logarithmic for Postgres.

Running select * from V where type = 'song' limit 100 
1000 times took 7156.845744 ms. That's 139.7263593166526 per second, and 7.156845744 ms per event.
Running select * from V where type = 'song' limit 200 
1000 times took 16034.44563 ms. That's 62.365735808728424 per second, and 16.03444563 ms per event.

Compare this to a query that is limted to 200 and then to 2000 results on postgres. As you can see the total time less than triples, as the resultset grows by a factor of 10.

Query 1 of 1, Rows read: 200, Elapsed time (seconds) - Total: 0.004, SQL query: 0.003, Reading results: 0.001
Query 1 of 1, Rows read: 2,000, Elapsed time (seconds) - Total: 0.011, SQL query: 0.006, Reading results: 0.005

I don't know what this means but it suggests that fetching the records is an expensive operation?

So the remaining questions are Can something be done about this 'it only works fast if you prepare it in exactly the right way' ? and Can you examine whether there is a performance problem with fetching large resultsets? (which is what seems to be the case over here).

lvca commented 10 years ago

@mrft Are you just executing the query or do you also browse the result set one by one and access to the fields?

mrft commented 10 years ago

@lvca I am not accessing the fields, I pretty much exactly copied laa's code. Since the result is a List, I assume that all the info is in there, and that the time it takes includes the time for getting all the field's values. Is that correct?

@laa I have another question, if I run your 'preparation'-code only once, can I run only the benchmark (latest loop in your code) for any query afterwards, and will I get a realistic figure for the performance ? (or maybe leave out the first 3-5 runs because they tend to be a bit slower, but they wont influence the average that much if you do 1000 runs). [If I have to run every single query 100000 times before it starts performing well, that would be unacceptable of course]

I am asking because then we'd have an understanding, and I could start trying out a lot of different queries that I think are relevant, and we can check if the speed makes sense (for example: select distinct(type) from V - after adding the relevant index - also seems slow).

Because until now, what we have been doing feels like 'alter the test so the function passes', whereas I feel that one should 'write the test, then make the function pass'. I also feel that it's not up to the user of Orientdb to prep the server to be performant, but the server itself should make sure that it can perform at full speed shortly after it has started (a few seconds up to a few minutes but not longer than that).

mrft commented 10 years ago

Another strange thing: After a while I get this performance (4ms) in the console for select distinct( type ) from V (after creating an index on type)

----+------+--------
#   |@CLASS|distinct
----+------+--------
0   |null  |song    
1   |null  |artist  
----+------+--------
2 item(s) found. Query executed in 0.004 sec(s).

But when running this same query using laa's benchmark code, I keep getting an average of about 30ms. Strange, why would laa's code be so much slower than the console for this query ( with and without databaseDocumentTx.getLocalCache().clear(); )?

Start the benchmark with 1000 queries (select distinct( type ) from V ).
Time : 298285221 ns. counter : 2000
Running this query (returning 2 results) 1000 times took 29828.522138 ms. That's 33.52495961327067 per second, and 29.828522138 ms per event.

Same results (30ms) for oriento in nodejs...

(I get an average of about 2ms on Postgres, which is similar to the results in the console...)

andrii0lomakin commented 10 years ago

No reason to omptimize remote storage interaction till new authentication model will be provided.

mrft commented 9 years ago

Purely informational, some new benchmark results using v2.0.

First without indexes, and then with indexes (notunique) on name, type and song_type columns

"select distinct( type ) from V" doesn't seem to use the index and needs a full table scan to find all possible values, which is slow of course.

explain select distinct( type ) from V

Profiled command '{optimizationElapsed:0,current:#9:808,evaluated:809,fetchingFromTargetElapsed:21,documentReads:809,user:#5:0,documentAnalyzedCompatibleClass:809,recordReads:809,elapsed:21.1337,resultType:collection,resultSize:2}' in 0.023000 sec(s): {"@type":"d","@version":0,"optimizationElapsed":0,"current":"#9:808","evaluated":809,"fetchingFromTargetElapsed":21,"documentReads":809,"user":"#5:0","documentAnalyzedCompatibleClass":809,"recordReads":809,"elapsed":21.1337,"resultType":"collection","resultSize":2,"@fieldTypes":"optimizationElapsed=l,current=x,evaluated=l,fetchingFromTargetElapsed=l,documentReads=l,user=x,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"}

WITHOUT INDEXES

Start the benchmark with 1000 queries (select from V where name = 'STANDER ON THE MOUNTAIN' ). Time : 113533730 ns. counter : 1000 Running this query (returning 1 results) 1000 times took 11353.373027 ms. That's 88.07955112739202 per second, and 11.353373027 ms per event.

Start the benchmark with 1000 queries (select from V where name = 'HEY BO DIDDLEY' ). Time : 113843996 ns. counter : 1000 Running this query (returning 1 results) 1000 times took 11384.399627 ms. That's 87.83950254419507 per second, and 11.384399627 ms per event.

Start the benchmark with 1000 queries (select * from V where song_type = 'original' ). Time : 126442780 ns. counter : 184000 Running this query (returning 184 results) 1000 times took 12644.278037 ms. That's 79.08715682095689 per second, and 12.644278037 ms per event.

Start the benchmark with 1000 queries (select * from V where song_type = 'original' limit 100 ). Time : 53345030 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 5334.50304 ms. That's 187.4588865170091 per second, and 5.3345030399999995 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' limit 100 ). Time : 34360785 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 3436.078525 ms. That's 291.0294374020454 per second, and 3.436078525 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' limit 500 ). Time : 122897201 ns. counter : 500000 Running this query (returning 500 results) 1000 times took 12289.720153 ms. That's 81.36881780468318 per second, and 12.289720153000001 ms per event.

Start the benchmark with 1000 queries (select name from V where type = 'song' limit 100 ). Time : 48835848 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 4883.584849 ms. That's 204.76761045828204 per second, and 4.883584849 ms per event.

Start the benchmark with 1000 queries (select name from V where type = 'song' limit 500 ). Time : 176030408 ns. counter : 500000 Running this query (returning 500 results) 1000 times took 17603.040805 ms. That's 56.808366865567805 per second, and 17.603040805 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' ). Time : 141386616 ns. counter : 584000 Running this query (returning 584 results) 1000 times took 14138.661688 ms. That's 70.72805206512132 per second, and 14.138661688000001 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' limit 100 ). Time : 86256039 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 8625.60399 ms. That's 115.93391038579317 per second, and 8.62560399 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' limit 200 ). Time : 130524879 ns. counter : 200000 Running this query (returning 200 results) 1000 times took 13052.487977 ms. That's 76.61374611201451 per second, and 13.052487977 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' ). Time : 139536159 ns. counter : 224000 Running this query (returning 224 results) 1000 times took 13953.615931 ms. That's 71.66601151593643 per second, and 13.953615931 ms per event.

Start the benchmark with 1000 queries (select distinct( type ) from V ). Time : 138131889 ns. counter : 2000 Running this query (returning 2 results) 1000 times took 13813.18893 ms. That's 72.39457920018474 per second, and 13.81318893 ms per event.

WITH INDEXES

Start the benchmark with 1000 queries (select from V where name = 'STANDER ON THE MOUNTAIN' ). Time : 6926333 ns. counter : 1000 Running this query (returning 1 results) 1000 times took 692.633384 ms. That's 1443.7652343941886 per second, and 0.692633384 ms per event.

Start the benchmark with 1000 queries (select from V where name = 'HEY BO DIDDLEY' ). Time : 9111726 ns. counter : 1000 Running this query (returning 1 results) 1000 times took 911.172603 ms. That's 1097.486905013978 per second, and 0.911172603 ms per event.

Start the benchmark with 1000 queries (select * from V where song_type = 'original' ). Time : 55548184 ns. counter : 184000 Running this query (returning 184 results) 1000 times took 5554.818462 ms. That's 180.02388500018634 per second, and 5.554818462 ms per event.

Start the benchmark with 1000 queries (select * from V where song_type = 'original' limit 100 ). Time : 38716364 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 3871.636488 ms. That's 258.28871153050255 per second, and 3.871636488 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' limit 100 ). Time : 38406157 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 3840.615723 ms. That's 260.37491697265546 per second, and 3.840615723 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' limit 500 ). Time : 105795004 ns. counter : 500000 Running this query (returning 500 results) 1000 times took 10579.500426 ms. That's 94.5224216393448 per second, and 10.579500426000001 ms per event.

Start the benchmark with 1000 queries (select name from V where type = 'song' limit 100 ). Time : 47336915 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 4733.691577 ms. That's 211.25161699566303 per second, and 4.733691576999999 ms per event.

Start the benchmark with 1000 queries (select name from V where type = 'song' limit 500 ). Time : 153366976 ns. counter : 500000 Running this query (returning 500 results) 1000 times took 15336.697631 ms. That's 65.20308504868117 per second, and 15.336697631 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'song' ). Time : 113562996 ns. counter : 584000 Running this query (returning 584 results) 1000 times took 11356.299655 ms. That's 88.0568521771716 per second, and 11.356299655 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' limit 100 ). Time : 28642008 ns. counter : 100000 Running this query (returning 100 results) 1000 times took 2864.200856 ms. That's 349.13752570989385 per second, and 2.864200856 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' limit 200 ). Time : 43286484 ns. counter : 200000 Running this query (returning 200 results) 1000 times took 4328.648443 ms. That's 231.01899199440254 per second, and 4.328648443 ms per event.

Start the benchmark with 1000 queries (select * from V where type = 'artist' ). Time : 47580440 ns. counter : 224000 Running this query (returning 224 results) 1000 times took 4758.044031 ms. That's 210.17039638236167 per second, and 4.758044031000001 ms per event.

Start the benchmark with 1000 queries (select distinct( type ) from V ). Time : 150766144 ns. counter : 2000 Running this query (returning 2 results) 1000 times took 15076.61445 ms. That's 66.32788835427174 per second, and 15.07661445 ms per event.

mrft commented 9 years ago

Some more test results on OrientDB version 2.0.5.

For the last query select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY', the difference between with and without indexes is incredible, which I find peculiar, since traversing edges is being advertised as have a very low cost, so you would expect this time to always (indexes or not) be very close to a simple select name from V where name='HEY BO DIDDLEY' but it takes 10 times as long without indexes, and about the same time with indexes.

Without indexes

query	# rows /query	# queries /sec	# rows /sec	ms /query	ms /row	# queries	totalTime (ms)
select * from V where song_type = 'original'	184	71.441	13145.172	13.998	0.076	1000	13997.535
select * from V where song_type = 'original' limit 100	100	165.990	16598.958	6.024	0.060	1000	6024.475
select * from V where type = 'song' limit 100	100	257.518	25751.799	3.883	0.039	1000	3883.224
select * from V where type = 'song' limit 500	500	77.580	38790.167	12.890	0.026	1000	12889.865
select name from V where type = 'song' limit 100	100	215.258	21525.838	4.646	0.046	1000	4645.580
select name from V where type = 'song' limit 500	500	57.107	28553.438	17.511	0.035	1000	17511.026
select * from V where type = 'song'	584	68.734	40140.737	14.549	0.025	1000	14548.811
select * from V where type = 'artist' limit 100	100	109.713	10971.268	9.115	0.091	1000	9114.717
select * from V where type = 'artist' limit 200	200	75.247	15049.494	13.289	0.066	1000	13289.484
select * from V where type = 'artist'	224	70.668	15829.675	14.151	0.063	1000	14150.638
select distinct( type ) from V	2	69.532	139.063	14.382	7.191	1000	14381.942
select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY'	1	7.289	7.289	137.198	137.198	1000	137198.168

With indexes (3 x notunique on name, song_type and type)

query	# rows /query	# queries /sec	# rows /sec	ms /query	ms /row	# queries	totalTime (ms)
select from V where name = 'STANDER ON THE MOUNTAIN'	1	998.787	998.787	1.001	1.001	1000	1001.215
select from V where name = 'HEY BO DIDDLEY'	1	1533.218	1533.218	0.652	0.652	1000	652.223
select * from V where song_type = 'original'	184	192.444	35409.669	5.196	0.028	1000	5196.321
select * from V where song_type = 'original' limit 100	100	267.173	26717.287	3.743	0.037	1000	3742.895
select * from V where type = 'song' limit 100	100	267.646	26764.556	3.736	0.037	1000	3736.285
select * from V where type = 'song' limit 500	500	99.432	49716.195	10.057	0.020	1000	10057.085
select name from V where type = 'song' limit 100	100	228.666	22866.639	4.373	0.044	1000	4373.183
select name from V where type = 'song' limit 500	500	66.468	33234.237	15.045	0.030	1000	15044.726
select * from V where type = 'song'	584	85.948	50193.442	11.635	0.020	1000	11634.986
select * from V where type = 'artist' limit 100	100	353.443	35344.267	2.829	0.028	1000	2829.313
select * from V where type = 'artist' limit 200	200	226.790	45358.027	4.409	0.022	1000	4409.363
select * from V where type = 'artist'	224	200.577	44929.252	4.986	0.022	1000	4985.616
select distinct( type ) from V	2	64.084	128.168	15.605	7.802	1000	15604.579
select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY'	1	1006.478	1006.478	0.994	0.994	1000	993.564

mattaylor commented 9 years ago

try without the subselect. Ie..

select name, $sungbyrecord, $sungby.in.name from V let $sungby = outE()[ @class = 'sung_by' ] where name='HEY BO DIDDLEY'

Sent from my Verizon Wireless 4G LTE smartphone

mrft commented 9 years ago

That's a different query.

What you propose would be similar to

select name, $followedby.in.name, $followedby.in.performances
from V 
let $followedby = outE()[ @class = 'followed_by' ]
where name='HEY BO DIDDLEY'

which returns 2 lists, 1 for name, and one for performances

which is a completely different thing than

select name, $followedbyrecord as followedbylist
from V 
let $followedby = outE()[ @class = 'followed_by' ],
    $followedbyrecord = list( (select in.name as name, in.performances as performances from $followedby) )
where name='HEY BO DIDDLEY'

which returns 1 list with objects containing 2 properties.

I am not looking for alternative queries, I am trying to let the developers know where improvements could be made.

mrft commented 9 years ago

Another performance related question from someone else: https://groups.google.com/forum/?pli=1#!topic/orient-database/LpG4kQSyqc8

andrii0lomakin commented 9 years ago

This issue is not specific , it is more like there are list of queries which are slow, there is no concrete query it is like aggregation of slow queries so I remove milestone from this issue.

andrii0lomakin commented 9 years ago

Also problem lies not in embedded version but n remote version so I assign @tglman to this issue.

andrii0lomakin commented 4 years ago

Hi, if this issue still actual please let us know otherwise we will close it.

orientechnologies / orientdb