neo4j-contrib / neomodel

An Object Graph Mapper (OGM) for the Neo4j graph database.
https://neomodel.readthedocs.io
MIT License
961 stars 232 forks source link

REST interface performance #54

Closed neoecos closed 11 years ago

neoecos commented 11 years ago

Hi everybody, anyone have done any benchmark on the REST interface of neo4j using neomodel and /or py2neo

I was trying to import some nodes (1.8M) with 26 properties from a CSV file and is frustrating, i tunned the neo4j and still is un usable.

No relationships.

I am aware of https://github.com/jexp/batch-import but just i can imagine a production enviroment with this configuration using neomodel.

Thank you,

robinedwards commented 11 years ago

Hello have you tried the neomodel create method?, it's built into the StructuredNode class. It won't be quite as fast as the batch-import program but it does group the rest requests into a single batch operation so it will be faster than looping over the dataset also it handles indexing which batch-import doesnt.

people = Person.create(
    {'name': 'Bob', 'age': 23},
    {'name': 'Tim', 'age': 44}
)
kristiank commented 11 years ago

Also I would like add, that batches over the REST interface aren't supposed to be very huge. You might start trying with appr 500 per batch and see if the performance is good enough.

technige commented 11 years ago

From previous experimentation, I've found a batch size of about 300 per batch gives optimum performance. Although that may be only with my setup and not a general rule.

On 5 July 2013 10:33, Kristian Kankainen notifications@github.com wrote:

Also I would like add, that batches over the REST interface aren't supposed to be very huge. You might start trying with appr 500 per batch and see if the performance is good enough.

— Reply to this email directly or view it on GitHubhttps://github.com/robinedwards/neomodel/issues/54#issuecomment-20509114 .

robinedwards commented 11 years ago

Thanks I've updated the docs :-)

neoecos commented 11 years ago

Hey !!!

You were absolutely right, now the import is at least 10x-15x faster using a batch size of 280-320 properties.

I have a doubt, when you refer to batch size is the number of properties and relationships, right ? or the number of nodes?

Thank you,

technige commented 11 years ago

The number comes from the number of actions taken within a batch. In this context, it would be the number of entities created, both nodes and relationships. On 5 Jul 2013 18:04, "Sebastian Ortiz" notifications@github.com wrote:

Hey !!!

You were absolutely right, now the import is at least 10x-15x faster using a batch size of 280-320 properties.

I have a doubt, when you refer to batch size is the number of properties and relationships, right ? or the number of nodes?

Thank you,

— Reply to this email directly or view it on GitHubhttps://github.com/robinedwards/neomodel/issues/54#issuecomment-20529088 .

neoecos commented 11 years ago

Thank you, now i have another doubt, the create method is like this

@classmethod
def create(cls, *props)
   ...

And with the example on this issue, @robinedwards i dont know how to call the create method with a list of dicts containing the data, something like

header = map(lambda f: f.strip('\n').strip('"').lower(), line.split('|'))
batchList = []
batchSize = 300
    for line in ins:
        fields = map(lambda f: f.strip('\n').strip('"'),
                     line.replace(',', '.').split('|'))
        record = {}
        map(lambda k, v: record.update({k: v}), header, fields)
        batchList.append(record)
        if len(batchList) == batchSize:
            #: this will fail
            RegistroPredioCatastroTipo2.create(batchList)
            batchList = []
RegistroPredioCatastroTipo2.create(batchList)
robinedwards commented 11 years ago

When you call create it's expecting a list of dicts as opposed to a reference to a list.

Try this:

RegistroPredioCatastroTipo2.create(*batchList)
neoecos commented 11 years ago

Thank you @robinedwards , today i will try to do the batch import test with the recomendations on this issue.

neoecos commented 11 years ago

Awesome !!!

Thank you for your help, using this method the performance on insert is acceptable for our requeriments, i am inserting about 2000 nodes with 30 properties per second.

Thank you again!