orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.72k stars 870 forks source link

createEdge and getVertex Performance Problem #8925

Closed ozgursucu closed 2 years ago

ozgursucu commented 5 years ago

OrientDB Version: 3.0.19

Java Version: 1.8

We are using orientdb as graphdb holding our product catalogue repository. Catalogs have specs and a category and their connection achieved via edge called catalogLink. Vertex Catalog -> catalogLink -> Vertex Spec. A catalog may have 50+ specs on average depending on category it belongs. Specs also connect to categories via specLink edge. Vertex Category -> specLink -> Vertex Spec. We store each spec with its name and id uniquely (i.e id=1, name=Apple). We are using orientdb tinkerpop blueprints OrientGraph (which relies on tinkerpop 2.6) implementation for transactions and edge/vertex crud operations. When specs have lots of relations for catalogs and categories, load time of vertex simply takes too long. For example for a spec with 100000 in_catalogLinks and 290 in_specLinks load time is almost 300 ms. Even more worse than this is edge creation. We load all specs with a single call (by id list which is also slow when they have lots of relations), create vertex catalog with id generated by orientdb sequence and some other fields, create edge relations between loaded specs, category and catalog and commit the transaction (All create operations are made using graph methods i.e. addVertex, addEdge). These operations I mentioned above are done in a single transaction. When I debug the code there are 60 transaction entries to be committed (OCommit37Request write method). It iterates over all entries and call OMessageHelper.writeTransactionEntry(network, txEntry, serializer); which is indeed sending the whole record info to the remote server and when the relations in the vertex is so much it is very slow. (Application almost waits 6 seconds for a successful commit which is certainly not acceptable) By the way executionMode is set to "asynchronous" in server config.

addCreation is done via graph addEdge method:

OrientEdge orientEdge =OrientBaseGraph.getActiveGraph().addEdge("class:" + className, from, to, className);

We have 3 nodes and use remote protocol. We have written our custom OrientTransactionManager over OrientGraph which simple wraps it and use in legacy transaction manager (AbstractPlatformTransactionManager) methods. We create graph factory as follows:

orientGraphFactory = new OrientGraphFactory(connectionConfiguration.getUrl(), connectionConfiguration.getUsername(), connectionConfiguration.getPassword()).setupPool(connectionConfiguration.getMinConnection(), connectionConfiguration.getMaxConnection()); orientGraphFactory.setAutoStartTx(false);

connectionConfiguration.getUrl() is simple remote:db1:port;db2:port;db3:port/dbName

We have also bought the enterprise edition last month. Any suggestion appreciated. Thanks.

luigidellaquila commented 5 years ago

Hi @ozgursucu

First of all, if you have an Enterprise Production Support contract in place, you should also have access to the internal issue tracker, I'd suggest to use it to have dedicated support.

In general, using TinkerPop 2.6 in v 3.0 is not recommended (it is deprecated), I'd suggest to use the Multi-Model API instead, or the new TinkerPop 3 API.

This said, six seconds for a single transaction is definitely too much, so we have to investigate it. Any chance to have a reproducer to debug the exact use case?

Thanks

Luigi

ozgursucu commented 5 years ago

Hi @luigidellaquila I will also open a case to the internal issue tracker. How can I use tinkerpop 3 api while our orientdb installation is not orientdb-tp3. In Multi-Model API how graph operations (i.e opening pool get tx etc) can be performed instead of using OrientGraph implementation.

luigidellaquila commented 5 years ago

Hi @ozgursucu

For the Multi-Model API, I'd suggest to check the following:

http://orientdb.com/docs/3.0.x/java/Java-MultiModel-API.html http://orientdb.com/docs/3.0.x/java/Document-API-Database.html http://orientdb.com/docs/3.0.x/java/Java-MultiModel-Data-API.html http://orientdb.com/docs/3.0.x/java/Java-Query-API.html http://orientdb.com/docs/3.0.x/java/Java-Schema-Api.html

To use TinkerPop 3 you need to use orientdb-tp3 of course

Thanks

Luigi

ozgursucu commented 5 years ago

And what about the maven dependency? Is there any dedicated jar files for enterprise edition? I can fetch the dependency from official maven central repository:

com.orientechnologies orientdb-core 3.0.21

I will first try using Multi-Model API and let you know any improvement observed about performance. Thanks.

luigidellaquila commented 5 years ago

Hi @ozgursucu

The Enterprise is closed-source and is not on Maven Central, for now you have to use the JAR file directly

Thanks

Luigi

ozgursucu commented 5 years ago

Hi @luigidellaquila what about com.tinkerpop.blueprints.impls.orient.OrientGraphFactoryV2 class you have implemented. I have already tried using it and nothing was changed. I suppose OrientGraph implementation relies on tinkerpop 2.6 and I need this implementation for graph operations as I am using it from TransactionManager and add Transactional annotation to the crud methods ( so no need of code duplication for begin, commit, close transaction). `package com.gg.papi.foundation.catalogv2.dao.base.manager;

import com.gg.papi.foundation.catalogv2.dao.base.factory.OrientDBFactory; import com.tinkerpop.blueprints.impls.orient.OrientBaseGraph; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.transaction.TransactionDefinition; import org.springframework.transaction.TransactionException; import org.springframework.transaction.support.AbstractPlatformTransactionManager; import org.springframework.transaction.support.DefaultTransactionStatus; import org.springframework.transaction.support.ResourceTransactionManager; import org.springframework.transaction.support.TransactionSynchronizationManager;

public class OrientTransactionManager extends AbstractPlatformTransactionManager implements ResourceTransactionManager {

private static final long serialVersionUID = 1L;

/**
 * The logger.
 */
private static final Logger log = LoggerFactory.getLogger(OrientTransactionManager.class);

/**
 * The database factory.
 */
private OrientDBFactory dbf;

/**
 * Instantiates a new {@link OrientTransactionManager}.
 *
 * @param dbf the dbf
 */
public OrientTransactionManager(OrientDBFactory dbf) {

    super();
    this.dbf = dbf;
}

/* (non-Javadoc)
 * @see org.springframework.transaction.support.AbstractPlatformTransactionManager#doGetTransaction()
 */
@Override
protected Object doGetTransaction() throws TransactionException {

    TransactionHolder transactionHolder = (TransactionHolder) TransactionSynchronizationManager.getResource(dbf);

    if (transactionHolder == null){
        OrientBaseGraph orientGraph = dbf.getTxActiveGraph();
        transactionHolder = new TransactionHolder(orientGraph);
        TransactionSynchronizationManager.bindResource(dbf, transactionHolder);
    }
    return transactionHolder;
}

/* (non-Javadoc)
 * @see org.springframework.transaction.support.AbstractPlatformTransactionManager#doBegin(java.lang.Object, org.springframework.transaction.TransactionDefinition)
 */
@Override
protected void doBegin(Object transaction, TransactionDefinition definition) throws TransactionException {
    TransactionHolder transactionHolder = (TransactionHolder)transaction;
    transactionHolder.getOrientGraph().begin();
    log.debug("beginning transaction, db.hashCode() = {}", transactionHolder.getOrientGraph().hashCode());

}

/* (non-Javadoc)
 * @see org.springframework.transaction.support.AbstractPlatformTransactionManager#doCommit(org.springframework.transaction.support.DefaultTransactionStatus)
 */
@Override
protected void doCommit(DefaultTransactionStatus status) throws TransactionException {

    TransactionHolder transactionHolder = (TransactionHolder)status.getTransaction();
    log.debug("committing transaction, db.hashCode() = {}", transactionHolder.getOrientGraph().hashCode());
    transactionHolder.getOrientGraph().commit();
}

/* (non-Javadoc)
 * @see org.springframework.transaction.support.AbstractPlatformTransactionManager#doRollback(org.springframework.transaction.support.DefaultTransactionStatus)
 */
@Override
protected void doRollback(DefaultTransactionStatus status) throws TransactionException {
    TransactionHolder transactionHolder = (TransactionHolder)status.getTransaction();
    if (transactionHolder.getOrientGraph() != null && !transactionHolder.getOrientGraph().isClosed()) {
        log.debug("rollbacking transaction, db.hashCode() = {}", transactionHolder.getOrientGraph().hashCode());
        transactionHolder.getOrientGraph().rollback();
    }
}

@Override
protected void doSetRollbackOnly(DefaultTransactionStatus status){
    if (!isExistingTransaction(status.getTransaction())){
        doRollback(status);
    }
}

@Override
protected void doCleanupAfterCompletion(Object transaction) {
    TransactionHolder transactionHolder = (TransactionHolder)transaction;
    if (transactionHolder.getOrientGraph()!= null && !transactionHolder.getOrientGraph().isClosed()) {
        releaseConnection(transactionHolder);
    }
    TransactionSynchronizationManager.unbindResource(dbf);
}

@Override
protected boolean isExistingTransaction(Object transaction) throws TransactionException {
    TransactionHolder transactionHolder = (TransactionHolder)transaction;
    return transactionHolder != null && transactionHolder.getOrientGraph()!=null && transactionHolder.getOrientGraph().getRawGraph().getTransaction().isActive();
}

protected void releaseConnection(TransactionHolder holder) {
    holder.getOrientGraph().shutdown();
}

/* (non-Javadoc)
 * @see org.springframework.transaction.support.ResourceTransactionManager#getResourceFactory()
 */
@Override
public Object getResourceFactory() {

    return dbf;
}

/**
 * Gets the database factory for the database managed by this transaction manager.
 *
 * @return the database
 */
public OrientDBFactory getDatabaseFactory() {

    return dbf;
}

/**
 * Sets the database factory for the database managed by this transaction manager.
 *
 * @param databaseFactory the database to set
 */
public void setDatabaseFactory(OrientDBFactory databaseFactory) {

    this.dbf = databaseFactory;
}

} so that I can use OrientBaseGraph.getActiveGraph() inside the transactional method. Example: private Vertex addVertex(String className, Map<String, Object> vertexProperties) { String vertexType = "class:" + className; return OrientBaseGraph.getActiveGraph().addVertex(vertexType, vertexProperties); }`

ozgursucu commented 5 years ago

The main reason for this slowdown is ORemoteStorage.command method fetches all the vertexes with their relations and when they have lots of relations elapsed time dramatically increases but I need these vertexes for edge creation so How can I manage to get vertexes without relations to add edges between them. For example I need 25 specs vertex in order to add edge to a catalog but some of them have too many relations. Load time is very slow as well as the commit time as OCommit37Request write method send all the records with their relations all the time. We ended up with huge CPU load in the past when concurrent write operations performed.

luigidellaquila commented 5 years ago

Hi @ozgursucu

Unfortunately there is no way to add edges to vertices without loading all the connected structures. Loading the data in remote takes time, so the typical solution to this problem is to completely avoid to load the data. You can do it executing the operations server-side, using SQL

Thanks

Luigi

ozgursucu commented 5 years ago

It is sad to hear about that :( So Orientdb is not a suitable db having thousands of relations in vertexes and addEdge operation is very expensive which is very bad. We have to refactor our codebase regarding write operations completely then.

and what about the consistency? Is it possible to rollback all the server side operations when something goes wrong in the transaction? I suppose Orientdb 3.0 is now supporting tranactional consistency along with server side commands but when I tried it server side operations persisted although the transaction is rollbacked?

luigidellaquila commented 5 years ago

Hi @ozgursucu

You can have as many relationships as you want, but you have to take into consideration the trade-offs of each API. In this case, using SQL gives you better performance.

About consistency, you can execute batch SQL scripts in remote, with BEGIN/COMMIT.

Thanks

LUigi

ozgursucu commented 5 years ago

Hi @luigidellaquila I investigate some sources and find out these pages: https://orientdb.com/docs/3.0.x/java/Java-Query-API.html https://orientdb.com/new-sql-command-batch/

How can it be possible to pass named parameters to batch script?

ozgursucu commented 5 years ago

Hi @luigidellaquila I have changed the code which was using tinkerpop api methods (addVertex, addEdge) to the batch sql script execution. In our test environment and standalone method elapsed time dramatically decreased to 200 milliseconds whereas in our production environment (3 nodes running) It takes 1-2 seconds on average which was nearly 10 seconds with graph methods. Approximately It is around 5 times faster than graph methods. Why is there big difference regarding execution time between standalone and distributed mode? Note that we are using asynchronous executionMode.

Jotschi commented 4 years ago

@luigidellaquila

Unfortunately there is no way to add edges to vertices without loading all the connected structures.

Can you explain this? This sound horrible...