neo4j / docker-neo4j

Docker Images for the Neo4j Graph Database
Apache License 2.0
333 stars 172 forks source link

Problem transferring large amount of data to python application #124

Open jasonnett80 opened 6 years ago

jasonnett80 commented 6 years ago

I have been testing a Docker image for Neo4j 3.2.9 enterprise edition with some code I have to build a Neo4j database. At first glance, it seems to be working well. However, I found that when I want to return some larger (on the order of a few thousand) nodes of a some arbitrary label, I get the following error:

MATCH (hcp:HCP_Derm) RETURN hcp;  
Traceback (most recent call last):
  File "switchboard.py", line 82, in <module>
    main(sys.argv[1:])
  File "switchboard.py", line 79, in main
    neo4j_main(build_options)
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/azure_to_neo4j_pipeline.py", line 484, in neo4j_main
    neo4j_session,
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/abbvie_neo4j_structure.py", line 410, in addMarketingIdentifierValues
    Neo4j_HCP = area_attributes['Neo4j_HCP'],
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 491, in write_transaction
    return self._run_transaction(WRITE_ACCESS, unit_of_work, *args, **kwargs)
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 473, in _run_transaction
    return unit_of_work(tx, *args, **kwargs)
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 546, in __exit__
    self.close()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 623, in close
self.session.commit_transaction()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 444, in commit_transaction
    result.consume()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 738, in consume
    list(self)
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 706, in records
    keys = self.keys()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 696, in keys
    self._session.fetch()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/v1/api.py", line 354, in fetch
    detail_count, _ = self._connection.fetch()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/bolt/connection.py", line 283, in fetch
    return self._fetch()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/bolt/connection.py", line 302, in _fetch
    details, summary_signature, summary_metadata = self._unpack()
  File "/Users/jason/Documents/rMarkBio/development/AbbvieRepositories/abbv-influencemap/graph_env/lib/python2.7/site-packages/neo4j/bolt/connection.py", line 354, in _unpack
    data = unpacker.unpack_list()
  File "neo4j/packstream/_unpacker.pyx", line 142, in neo4j.packstream._unpacker.Unpacker.unpack_list (neo4j/packstream/_unpacker.c:4024)
  File "neo4j/packstream/_unpacker.pyx", line 146, in neo4j.packstream._unpacker.Unpacker.unpack_list (neo4j/packstream/_unpacker.c:3977)
  File "neo4j/packstream/_unpacker.pyx", line 159, in neo4j.packstream._unpacker.Unpacker._unpack_list (neo4j/packstream/_unpacker.c:4149)
  File "neo4j/packstream/_unpacker.pyx", line 133, in neo4j.packstream._unpacker.Unpacker._unpack (neo4j/packstream/_unpacker.c:3784)
  File "neo4j/packstream/_unpacker.pyx", line 126, in neo4j.packstream._unpacker.Unpacker._unpack (neo4j/packstream/_unpacker.c:3550)
  File "neo4j/packstream/_unpacker.pyx", line 197, in neo4j.packstream._unpacker.Unpacker._unpack_map (neo4j/packstream/_unpacker.c:5033)
  File "neo4j/packstream/_unpacker.pyx", line 97, in neo4j.packstream._unpacker.Unpacker._unpack (neo4j/packstream/_unpacker.c:2440)
AttributeError: 'bytearray' object has no attribute 'tobytes'

This occurs regardless of what python version I use (I tried several across python 2 and 3) or whether I'm running my python code from in or out of a Docker container itself. The only variable that seems to determine whether or not I encounter this error is the number of nodes (or perhaps size of data) being returned.

I'm going to have to abandon use of Neo4j Docker images until this is sorted out, unfortunately.

jasonnett80 commented 6 years ago

I'm coming to realize that this may be an issue of how much memory is allocated to the Neo4j container. If that turns out to be the case, perhaps just more convenient handling and errors messages would be in order.

On my end, I'll update my code to retrieve nodes in batches if the match count is larger than some specified value.