orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.72k stars 870 forks source link

Poor performance of commit when deleting from graph #5878

Closed joe568277 closed 2 years ago

joe568277 commented 8 years ago

Affects OrientDB release 2.1.13.

We're encountering very slow performance of commits when deleting from the graph, and have tracked this down to the callback executed to update the indexes. The following test case demonstrates this:

import com.orientechnologies.orient.core.metadata.schema.OClass;
import com.orientechnologies.orient.core.metadata.schema.OType;
import com.tinkerpop.blueprints.Direction;
import com.tinkerpop.blueprints.Vertex;
import com.tinkerpop.blueprints.impls.orient.OrientGraph;
import com.tinkerpop.blueprints.impls.orient.OrientVertexType;

import java.nio.charset.Charset;
import java.util.*;

public class DeletePerformanceTestCase {
    static String[] SURNAMES = { "SMITH", "HONDA", "SCHMIDT" };
    static final int COUNTRY_COUNT = 10;
    static final int INITIAL_CUSTOMER_COUNT = 100;
    static final int REFERRED_CUSTOMER_COUNT = 100000;

    public static void main(final String[] args) throws Exception {
    final OrientGraph g = new OrientGraph("memory:referrals");
    final Random r = new Random(0L);
    final long loadStart = System.currentTimeMillis();

    final List<Vertex> countries = new ArrayList<>();
    g.setAutoStartTx(false);
    g.commit();

    final OrientVertexType customerType = g.createVertexType("Customer", "V");
    customerType.createProperty("firstName", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("lastName", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop1", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop2", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop3", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop4", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop5", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
    customerType.createProperty("prop6", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);

    g.setAutoStartTx(true);

    {
        final Iterator<Charset> names = Charset.availableCharsets().values().iterator();
        for (int i = 0; i < COUNTRY_COUNT; i++) {
            countries.add(g.addVertex("class:Country", "name", names.next().displayName()));
        }
    }

    final List<Vertex> customers = new ArrayList<>();

    for (int i = 0; i < INITIAL_CUSTOMER_COUNT; i++) {
        final Vertex initCustomer = g.addVertex("class:Person", "firstName", UUID.randomUUID().toString(),
                                                                "lastName", SURNAMES[r.nextInt(SURNAMES.length)],
                                                                "prop1", UUID.randomUUID().toString(),
                                                                "prop2", SURNAMES[r.nextInt(SURNAMES.length)],
                                                                "prop3", UUID.randomUUID().toString(),
                                                                "propX", UUID.randomUUID().toString());
        final Vertex country = countries.get(r.nextInt(countries.size()));
        initCustomer.addEdge("memberOf", country);
        country.addEdge("initMember", initCustomer);
        customers.add(initCustomer);
    }

    g.commit();

    long count = 0;

    for (int i = 0; i < REFERRED_CUSTOMER_COUNT; i++) {
        final Vertex customer = g.addVertex("class:Person", "firstName", UUID.randomUUID().toString(),
                                                            "lastName", SURNAMES[r.nextInt(SURNAMES.length)],
                                                            "prop1", UUID.randomUUID().toString(),
                                                            "prop2", SURNAMES[r.nextInt(SURNAMES.length)],
                                                            "prop3", UUID.randomUUID().toString(),
                                                            "propX", UUID.randomUUID().toString());
        final Vertex referrer = customers.get(r.nextInt(customers.size()));
        final Vertex country = referrer.getVertices(Direction.OUT, "memberOf").iterator().next();
        customer.addEdge("memberOf", country);
        customer.addEdge("referrer", referrer);
        customers.add(customer);
        if (++count % 1000 == 0) {
            final long start = System.currentTimeMillis();
            g.commit();
            System.out.printf("insert, %d, %d\n", customers.size(), (System.currentTimeMillis() - start));
        }
    }

    g.commit();

    System.err.println(g);
    System.err.println(System.currentTimeMillis() - loadStart);

    while (customers.size() > 0) {
        final long beforeSize = customers.size();
        for (int j = 0; j < Math.min(1000, beforeSize); j++) {
            customers.remove(r.nextInt(customers.size())).remove();
        }
        final long start = System.currentTimeMillis();
        g.commit();
        System.out.printf("delete, %d, %d\n", customers.size(), (System.currentTimeMillis() - start));
    }
    System.err.println(g);
    System.err.println(System.currentTimeMillis() - loadStart);
    }
}

The time taken to commit a batch of 1000 deletions appears to be linearly proportional to the database size, or worse: plot

andrii0lomakin commented 8 years ago

Seems related to https://github.com/orientechnologies/orientdb/issues/5866

joe568277 commented 8 years ago

I don't think #5866 is the same problem that I'm seeing. I'm specifically seeing the update of indexes on deletes taking the majority of time during a commit. The previous test case didn't demonstrate this properly. I've got a simpler test case below of inserting and deleting 10,000 vertices with a single indexed property. I've added extra logging to OAbstractPaginatedStorage to log the total time taken by the commit and the time taken by callback.run() which updates the index. The indexing time for insertion of 10,000 vertices is about 500ms, whereas deletion about 23,000ms.

Test code:

package com.example.test.orientdb.performance;

import com.orientechnologies.orient.core.metadata.schema.OClass;
import com.orientechnologies.orient.core.metadata.schema.OType;
import com.tinkerpop.blueprints.Vertex;
import com.tinkerpop.blueprints.impls.orient.OrientGraph;
import com.tinkerpop.blueprints.impls.orient.OrientVertexType;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.bridge.SLF4JBridgeHandler;

import java.util.UUID;

public class SimpleDeleteTest {
    private static final Logger logger = LoggerFactory.getLogger(SimpleDeleteTest.class);

    private static final String DBURL = "plocal:target/databases/deletetest";
    private static final int CREATE_VERTEX_COUNT = 10000;
    private static final int COMMIT_INTERVAL = 10000;

    private OrientGraph g;

    @Before
    public void init() {
        SLF4JBridgeHandler.removeHandlersForRootLogger();
        SLF4JBridgeHandler.install();

        final long preDbCreateTime = System.currentTimeMillis();
        g = new OrientGraph(DBURL);
        logger.info("Created database in {}", System.currentTimeMillis() - preDbCreateTime);

        g.setAutoStartTx(false);
        g.commit();
        final OrientVertexType deleteType = g.createVertexType("DeleteThese", "V");
        deleteType.createProperty("name", OType.STRING).createIndex(OClass.INDEX_TYPE.NOTUNIQUE);
        g.setAutoStartTx(true);

        int count = 0;
        for (int i = 0; i < CREATE_VERTEX_COUNT; i++) {
            g.addVertex("class:DeleteThese", "name", UUID.randomUUID().toString());

            if (++count % COMMIT_INTERVAL == 0) {
                final long commitStartTime = System.currentTimeMillis();
                g.commit();
                logger.info("Creation of {} vertices committed in {}", COMMIT_INTERVAL, System.currentTimeMillis() - commitStartTime);
            }
        }
    }

    @Test
    public void deleteVertices() {
        int count = 0;
        for (Vertex v : g.getVertices()) {
            v.remove();
            if (++count % COMMIT_INTERVAL == 0) {
                final long commitStartTime = System.currentTimeMillis();
                g.commit();
                logger.info("Deletion of {} vertices committed in {}", COMMIT_INTERVAL, System.currentTimeMillis() - commitStartTime);
            }
        }
    }

    @After
    public void tearDown() {
        g.drop();
    }
}

Output:

Running com.example.test.orientdb.performance.SimpleDeleteTest
08:10:21.064 [main] INFO com.orientechnologies - OrientDB auto-config DISKCACHE=4,160MB (heap=1,776MB os=7,984MB disk=24,244MB)
08:10:22.065 [main] INFO com.example.test.orientdb.performance.SimpleDeleteTest - Created database in 1281
08:10:22.502 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Starting commit
08:10:22.791 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Starting callback.run()
08:10:23.320 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Finished callback.run() in 529ms
08:10:23.385 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Finished commit in 883ms
08:10:23.387 [main] INFO com.example.test.orientdb.performance.SimpleDeleteTest - Creation of 10000 vertices committed in 886
08:10:23.585 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Starting commit
08:10:23.736 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Starting callback.run()
08:10:46.801 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Finished callback.run() in 23065ms
08:10:47.410 [main] INFO com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage - {db=deletetest} Finished commit in 23825ms
08:10:47.412 [main] INFO com.example.test.orientdb.performance.SimpleDeleteTest - Deletion of 10000 vertices committed in 23827
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.729 sec
andrii0lomakin commented 8 years ago

@joe568277 deletion of single vertex takes 2.3 ms according to your report. Is is too slow for your case , or may be I got you wrong ?

joe568277 commented 8 years ago

@laa Yes - that's right. I need to regularly remove 1000s of vertices from large graph database, and the deletion rate is too slow with indexes on the vertices. In the test case above I had only one index - with more indexes it gets slower.

andrii0lomakin commented 8 years ago

well we may introduce interesting feature, to speed up things, but it will be only in 2.2 version

joe568277 commented 8 years ago

Thanks - please update when there's something on a 2.2 branch that I can try

andrii0lomakin commented 8 years ago

sorry moved to 3.0 because we are going to release 2.2 rc next week but there is probability that we will move it to closer version.

smolinari commented 8 years ago

@joe568277 - If I may ask, what is the use case, which calls for a constant deletion of so many vertexes at one time? Is a user sitting in front of a computer waiting for this to happen? Or is it a background task?

It is for my own curiosity. :smile:

Scott

andrii0lomakin commented 8 years ago

Will be fixed, as part of https://github.com/orientechnologies/orientdb/issues/6041