kbastani / graphify

Graphify is a Neo4j unmanaged extension used for document and text classification using graph-based hierarchical pattern recognition.
http://graphify.github.io/graphify
Apache License 2.0
382 stars 89 forks source link

Training on large corpora is extremely slow, anyway to parallelize the pattern detector? #17

Open jhashemi opened 9 years ago

nabilblk commented 9 years ago

+1 , the Training is extremely slow

kbastani commented 9 years ago

Can you provide memory configurations? Please copy and paste your properties from neo4j.properties in the neo4j /conf directory.

Recommended memory settings are below:

neostore.nodestore.db.mapped_memory=512M neostore.relationshipstore.db.mapped_memory=2048M neostore.propertystore.db.mapped_memory=1024M neostore.propertystore.db.strings.mapped_memory=500M neostore.propertystore.db.arrays.mapped_memory=500M

This configuration assumes you have at least 8GB of available system memory.

jhashemi commented 9 years ago

Definitely helped training, but now classification takes upwards of 3+minutes per entity. This is using a HA cluster

kbastani commented 9 years ago

Glad to hear it helped training. I'm going to need more information about your dataset in order to get you fixed up. You can reach me on Skype at kenny.bastani or e-mail kb@socialmoon.com.

letronje commented 9 years ago

Using the recommended memory settings above certain improves the training speed(most requests are sub-second). Classification requests take anywhere between 15 to 30 seconds. Any way to speed them up ? Also, if multiple classify requests are sent in parallel, it throws a 500.