micahstubbs / blockbuilder-graph-search-index

backend for the blockbuilder d3 example graph search engine
0 stars 1 forks source link

write a script to combine blocks csvs #17

Closed micahstubbs closed 6 years ago

micahstubbs commented 6 years ago

combine

csv-graphs-for-neo4j/blocks.csv csv-graphs-for-neo4j/readme-links-blocks.csv

enforce uniqueness on the gistId:ID column

neo4j-import error message:

org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.DuplicateInputIdException: Id '583734' is defined more than once in global id space, at least at /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/blocks.csv:10 and /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/readme-links-blocks.csv:1312

full output

➜  blockbuilder-graph-search-index git:(master) ✗ sh load-csv-graph-into-neo4j.sh
WARNING: neo4j-import is deprecated and support for it will be removed in a future
version of Neo4j; please use neo4j-admin import instead.

Neo4j version: 3.3.0
Importing the contents of these files into data/databases/blockbuilder-graph-search.db:
Nodes:
  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/blocks.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/readme-links-blocks.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/users.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/functions.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/colors.csv
Relationships:
  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/readme-links-relationships.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/user-built-block-relationships.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/block-calls-function-relationships.csv

  /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/block-uses-color-relationships.csv

Available resources:
  Total machine memory: 16.00 GB
  Free machine memory: 1.82 GB
  Max heap memory : 3.56 GB
  Processors: 8
  Configured max memory: -1680382771.00 B

Nodes, started 2018-05-08 01:08:30.805+0000
[*>:??------------------------------|NODE:7.63 MB-----------|PROPERTIES(3)=====|v:??(2)=======]54.6K ∆54.6K
Done in 607ms
Prepare node index, started 2018-05-08 01:08:31.500+0000
[*DEDUPLICATE:15.60 MB------------------------------------------------------------------------]    0 ∆    0
Done in 383ms
Exception in thread "Thread-5" org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.DuplicateInputIdException: Id '583734' is defined more than once in global id space, at least at /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/blocks.csv:10 and /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/readme-links-blocks.csv:1312
    at org.neo4j.unsafe.impl.batchimport.input.BadCollector$NodesProblemReporter.exception(BadCollector.java:213)
    at org.neo4j.unsafe.impl.batchimport.input.BadCollector.checkTolerance(BadCollector.java:142)
    at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectDuplicateNode(BadCollector.java:88)
    at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.detectDuplicateInputIds(EncodingIdMapper.java:648)
    at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:552)
    at org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:279)
    at org.neo4j.unsafe.impl.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
    at org.neo4j.unsafe.impl.batchimport.staging.LonelyProcessingStep$1.run(LonelyProcessingStep.java:57)
Duplicate input ids that would otherwise clash can be put into separate id space, read more about how to use id spaces in the manual: https://neo4j.com/docs/operations-manual/3.3/tools/import/file-header-format/#import-tool-id-spaces
Caused by:Id '583734' is defined more than once in global id space, at least at /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/blocks.csv:10 and /Users/m/workspace/blockbuilder-graph-search-index/data/csv-graphs-for-neo4j/readme-links-blocks.csv:1312

WARNING Import failed. The store files in /Users/m/Downloads/neo4j/neo4j-enterprise-3.3.0/data/databases/blockbuilder-graph-search.db are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
micahstubbs commented 6 years ago

this will make it possible to cleanly import data into neo4j