syedhassaanahmed / neo-to-cosmos

Copy Neo4j data to Azure Cosmos DB
MIT License
11 stars 3 forks source link
arm-templates azure-cli azure-container-instances azure-cosmos-db docker dotnet-core graph gremlin neo4j

neo-to-cosmos

Build Status Docker Build Status Docker Pulls

Deploy to Azure

This app takes a Neo4j database snapshot and copies all content to an Azure Cosmos DB Graph database using the BulkExecutor library.

Disclaimer

Get Started

The first thing you'll need is a Neo4j database. Docker is the quickest way to get started. If you're looking for Neo4j docker images with pre-populated Graph datasets, we've got you covered! e.g. The following will spin up a container of Game of Thrones dataset:

docker run --name neo4j-got -p 7474:7474 -p 7687:7687 -d syedhassaanahmed/neo4j-game-of-thrones

Browse the data by pointing to http://localhost:7474. Initial Neo4j login/password will be "neo4j/neo4j".

Configuration

Before you run the app, you'll need to supply environment variables which contain settings to your Neo4j and Cosmos DB databases.

COSMOSDB_ENDPOINT=https://<COSMOSDB_ACCOUNT>.documents.azure.com:443/
COSMOSDB_AUTHKEY=<COSMOSDB_AUTHKEY>
COSMOSDB_DATABASE=graphdb
COSMOSDB_CONTAINER=graphcont
COSMOSDB_PARTITIONKEY=someProperty
COSMOSDB_OFFERTHROUGHPUT=1000 #default is 400

NEO4J_ENDPOINT=neo4j://<NEO4J_ENDPOINT>:7687
NEO4J_USERNAME=neo4j #default is 'neo4j'
NEO4J_PASSWORD=<NEO4J_PASSWORD>

CACHE_PATH=<PATH_TO_CACHE_DIRECTORY> #default is 'cache'

Run the tool

dotnet NeoToCosmos.dll and watch your data being copied. If for some reason you couldn't transfer the data completely, simply rerun the command. For fresh clean start, add -r switch.

Here is how to run the containerized version of the tool.

docker run -d -e <ENVIRONMENT_VARIABLES> syedhassaanahmed/neo-to-cosmos

Scale out

Copying large volume of data from Neo4j to CosmosDB using a single instance of the app may not be entirely feasible, even with maxed out RUs and a cache layer. Hence we've provided an ARM template to orchestrate deployment of Cosmos DB and N number of Azure Container Instances, each performs a portion of data migration.

In order to achieve resilience during the migration, we also persist a RocksDB cache on an emptyDir volume. An emptyDir can survive container crashes.

To deploy the template using latest Azure CLI 2.0;

az group deployment create -g <RESOURCE_GROUP> \
    --template-file azuredeploy.json \
    --parameters \
        cosmosDbPartitionKey=someProperty \
        neo4jEndpoint=neo4j://<NEO4J_ENDPOINT>:7687 \
        neo4jPassword=<NEO4J_PASSWORD>

Credits

This work builds upon the great effort Brian Sherwin has done in this repo.