Closed Btibert3 closed 4 years ago
I think I see what is happening.
What version of Neo4j are you ingesting into?
Also, can you tell what version the python driver got pulled in by pip?
So for an added wrinkle, I am using Sandbox. I want to show a quick POC of using free, cloud tools to explore. That said, I hope this helps:
Enterprise 3.5.11 (blank sandbox template)
after the pip install, I see
Installing collected packages: ijson, neo4j
Successfully installed ijson-3.0.4 neo4j-4.0.0
Ah that makes sense. The API changed somewhat in the 4.0 version of the Driver. (I had thought that was still pre-release, but it seems not..) Probably the quickest way for me to help you get going would be to change the pip requirements to force the 1.7 version of the Driver. (The versions jumped from 1.7 to 4.0 ....) Then I can do some testing with the 4.0 version...
I updated the repo; can you update and try running the pip install again?
Thanks for your patience! Matt
I appreciate the quick responses, sorry for the delay.
Here is the error I get now:
{} : Reading file 2020-06-18 01:37:55.604271
File {} https://docs.google.com/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub?gid=0&single=true&output=csv
Traceback (most recent call last):
File "pyingest/src/main/ingest.py", line 205, in <module>
main()
File "pyingest/src/main/ingest.py", line 199, in main
server.load_file(file)
File "pyingest/src/main/ingest.py", line 55, in load_file
self.load_csv(file)
File "pyingest/src/main/ingest.py", line 117, in load_csv
openfile = file_handle(params['url'], params['compression'])
File "pyingest/src/main/ingest.py", line 179, in file_handle
return open(path)
File "/usr/local/lib/python3.6/dist-packages/smart_open/smart_open_lib.py", line 189, in open
errors=errors,
File "/usr/local/lib/python3.6/dist-packages/smart_open/smart_open_lib.py", line 362, in _shortcut_open
return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub'
Using Config
server_uri:
admin_user: neo4j
admin_pass:
files:
- url: https://docs.google.com/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub?gid=0&single=true&output=csv
cql: |
WITH $dict.rows as rows UNWIND rows as row
MERGE (u:User {id:row.user})
MERGE (s:Session {id:row.session})
MERGE (u)-[:HAD]->(s)
WITH u, row, s
CALL apoc.create.node([row.type], {seq: row.seq, value: row.value}) YIELD node
CREATE (s)-[:ACTION]->(node)
pre_ingest:
- "CREATE CONSTRAINT on (n:User) assert n.id is unique"
- "CREATE CONSTRAINT on (n:Session) assert n.id is unique"
post_ingest:
- "match (s)-[:ACTION]->(x)
WITH s, x
ORDER BY x.seq
WITH s, collect(x) as actions
CALL apoc.nodes.link(actions, 'NEXT')
RETURN count(*)"
In case it helps, here is the notebook: https://colab.research.google.com/drive/1Zc4mBVGNyRIIEVueDD_LjnsTH27x89so?usp=sharing
We use smart_open library to open url's, but it seems to be unhappy about this one.. Let me take a closer look.. As a workaround, would saving the csv file locally be an option? Not sure how that fit into your workflow..
Absolutely, using local csv files works for most of my needs. I was stretching this a bit for a POC to show that we can do things in the cloud for free, but when I modify the config file to point a datafile "locally", it works brilliantly.
Should I close this ticket and create a separate one for the smart_open library for web-based resources?
Actually, I think I got it working from the https url. I was able to run it locally. Can you refresh and try the cloud address?
Awesome, worked perfectly!
I am getting the following error:
using this
config.yml
file (with the uri and password removed)Perhaps it is something on my end, but my cypher commands work stand-alone prior to building the config file.