Graph Loader, aka Graph Analyzer v 1.0 - Alex Babeanu, 2016
The Graph Loader and Analyzer tool creates a Neo4J Graph database from data read from any LDAP directory. In the graph, Users and Groups are related using 2 types of relationships:
MEMBER_OF
: user group membershipPART_OF
: Group hierarchy, relates a child to its parent GroupSample resulting Graph:
The idea is to then :
Finally (v 3.0), the Analyzer could apply the suggested optimizations directly to the LDAP directory.
Note that the Loader can also be just used as such: an LDAP to Graph loader tool.
Currently at version 1 (December 2015), the Graph Analyzer has the following features:
A sample LDIF files is provided, with 2000 users and 200 groups (file: data/NeoLDAPSample.ldif
). The group memberships are random.
The utility is built as a Gradle project using the Groovy language. It therefore requires:
Finally, this utility uses the UnboundID Java LDAP SDK, the corresponding libraries are therefore also required.
develop
branch !).resources
subdir). See below for configuration details...gradle build --refresh-dependencies
.neoLoader.sh
- Or on Windows boxes:
groovy -cp build/libs/GraphAnalyzer-0.0.1-SNAPSHOT.jar:runtime/* -Djava.util.logging.config.file=resources/logging.properties NeoLoader.groovy
The Graph Loader uses the following config files...
Gathers all the connectivity and metadata of the source LDAP Directory. In particular:
host
: LDAP Server Hostport
: LDAP Server Portbinddn
: LDAP Admin User DNpassword
: LDAP User passworduserBasedn
: The DN of the container under which the User Accounts are stored. The Loader will NOT create a node for this container (on flat DIT, it may contain millions of user accounts, which would imply too many relationships in the graph for a single node).groupBaseDn
: The DN of the container under which the Groups are stored. The Loader will NOT create a node for this container.userobjclass
: The LDAP Objectclass of the User Accounts. Used for searching.groupobjclass
: The LDAP Objectclass of the Groups. Used for searching.memberattribute
: The name of the Group multi-valued attribute that stores group members. The Member values are expected to be User Account DNs.nbconnections
: The number of connections to use in the LDAP Connections pool.pagesize
: The size of the LDAP search page to use while searching. For best results, this should be set to the LDAP Server search limit.The Java Logging properties file. Used here mainly to set the log level for the utility. Just set the .level
property.
Gathers all configuration information pertaining to the target Neo4J server. I.e.,
host
: The Neo4J Hostport
: The Neo4J portauthorization
: The Base64-encoded basic HTTP authentication credentials for the Neo4J User, used in REST calls to Neo4J. The encoded string is in the form: <user>:<password>
A JSON file that provides LDAP Attribute to Neo4J Node Property mappings for each Entity supported by the Loader. The Entities can be either of Group or Person. Mappings are provided for each: LDAP Attributes (left) are mapped to Neo4J property names (right). The Loader will only fetch the LDAP attributes mapped here.
The Graph Loader is designed with performance in mind, at the cost of RAM. It performs the minimal I/O operations on both source and target sides. To achieve this, the Loader buffers all read LDAP entries in memory.
The amount of memory required depends on the number of attributes mapped (the fewer the smaller the amount of required RAM), the size of the mapped attribute values and the number of Entries to load. Several GB of RAm may be required for millions of LDAP entries.
Since the Neo4J server is expected to be remote, the Loader can't use the Neo4J Java SDK, which only supports (as of Dec. 2015) embedded Neo instances. The loader therefore has to issue as many REST create calls as there are nodes and relationships to create (see also section 'Next Steps' below). The Neo4J I/Os are the bottleneck here.
Benchmarking baseline, using the provided LDIF sample:
2002 Users, 205 Groups and 2774 Relationships created in about 35s on a Mac powerbook with 16GB RAM...
Possible future enhancements:
For v 2.0:
For v 3.0: