neo4j-field / pyingest

Apache License 2.0
49 stars 14 forks source link

Error: query argument missing #3

Closed Btibert3 closed 4 years ago

Btibert3 commented 4 years ago

I am getting the following error:

Traceback (most recent call last):
  File "pyingest/src/main/ingest.py", line 205, in <module>
    main()
  File "pyingest/src/main/ingest.py", line 196, in main
    server.pre_ingest()
  File "pyingest/src/main/ingest.py", line 150, in pre_ingest
    session.run(statement=statement)
TypeError: run() missing 1 required positional argument: 'query'

using this config.yml file (with the uri and password removed)

server_uri: 
admin_user: neo4j
admin_pass: 

files:
  - url: https://docs.google.com/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub?gid=0&single=true&output=csv
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
          MERGE (u:User {id:row.user})
          MERGE (s:Session {id:row.session})
          MERGE (u)-[:HAD]->(s)
          WITH u, row, s
          CALL apoc.create.node([row.type], {seq: row.seq, value: row.value}) YIELD node 
          CREATE (s)-[:ACTION]->(node)

pre_ingest:
  - "CREATE CONSTRAINT on (n:User) assert n.id is unique"
  - "CREATE CONSTRAINT on (n:Session) assert n.id is unique"

post_ingest:
  - "match (s)-[:ACTION]->(x)
      WITH s, x
      ORDER BY x.seq
      WITH s, collect(x) as actions
      CALL apoc.nodes.link(actions, 'NEXT')
      RETURN count(*)"

Perhaps it is something on my end, but my cypher commands work stand-alone prior to building the config file.

mholford-neo commented 4 years ago

I think I see what is happening.
What version of Neo4j are you ingesting into? Also, can you tell what version the python driver got pulled in by pip?

Btibert3 commented 4 years ago

So for an added wrinkle, I am using Sandbox. I want to show a quick POC of using free, cloud tools to explore. That said, I hope this helps:

Enterprise 3.5.11 (blank sandbox template)

after the pip install, I see

Installing collected packages: ijson, neo4j
Successfully installed ijson-3.0.4 neo4j-4.0.0
mholford-neo commented 4 years ago

Ah that makes sense. The API changed somewhat in the 4.0 version of the Driver. (I had thought that was still pre-release, but it seems not..) Probably the quickest way for me to help you get going would be to change the pip requirements to force the 1.7 version of the Driver. (The versions jumped from 1.7 to 4.0 ....) Then I can do some testing with the 4.0 version...

I updated the repo; can you update and try running the pip install again?

Thanks for your patience! Matt

Btibert3 commented 4 years ago

I appreciate the quick responses, sorry for the delay.

Here is the error I get now:

{} : Reading file 2020-06-18 01:37:55.604271
File {} https://docs.google.com/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub?gid=0&single=true&output=csv
Traceback (most recent call last):
  File "pyingest/src/main/ingest.py", line 205, in <module>
    main()
  File "pyingest/src/main/ingest.py", line 199, in main
    server.load_file(file)
  File "pyingest/src/main/ingest.py", line 55, in load_file
    self.load_csv(file)
  File "pyingest/src/main/ingest.py", line 117, in load_csv
    openfile = file_handle(params['url'], params['compression'])
  File "pyingest/src/main/ingest.py", line 179, in file_handle
    return open(path)
  File "/usr/local/lib/python3.6/dist-packages/smart_open/smart_open_lib.py", line 189, in open
    errors=errors,
  File "/usr/local/lib/python3.6/dist-packages/smart_open/smart_open_lib.py", line 362, in _shortcut_open
    return _builtin_open(local_path, mode, buffering=buffering, **open_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub'

Using Config

server_uri: 
admin_user: neo4j
admin_pass: 

files:
  - url: https://docs.google.com/spreadsheets/d/e/2PACX-1vT9-l5I0BhrBnx0zvG6HMUB_pmU-FudHU_Do-cmYv7TUGbAQOpImf8Xu1gxeTMgPEsHL5beW1691R1K/pub?gid=0&single=true&output=csv
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
          MERGE (u:User {id:row.user})
          MERGE (s:Session {id:row.session})
          MERGE (u)-[:HAD]->(s)
          WITH u, row, s
          CALL apoc.create.node([row.type], {seq: row.seq, value: row.value}) YIELD node 
          CREATE (s)-[:ACTION]->(node)

pre_ingest:
  - "CREATE CONSTRAINT on (n:User) assert n.id is unique"
  - "CREATE CONSTRAINT on (n:Session) assert n.id is unique"

post_ingest:
  - "match (s)-[:ACTION]->(x)
      WITH s, x
      ORDER BY x.seq
      WITH s, collect(x) as actions
      CALL apoc.nodes.link(actions, 'NEXT')
      RETURN count(*)"

In case it helps, here is the notebook: https://colab.research.google.com/drive/1Zc4mBVGNyRIIEVueDD_LjnsTH27x89so?usp=sharing

mholford-neo commented 4 years ago

We use smart_open library to open url's, but it seems to be unhappy about this one.. Let me take a closer look.. As a workaround, would saving the csv file locally be an option? Not sure how that fit into your workflow..

Btibert3 commented 4 years ago

Absolutely, using local csv files works for most of my needs. I was stretching this a bit for a POC to show that we can do things in the cloud for free, but when I modify the config file to point a datafile "locally", it works brilliantly.

Should I close this ticket and create a separate one for the smart_open library for web-based resources?

mholford-neo commented 4 years ago

Actually, I think I got it working from the https url. I was able to run it locally. Can you refresh and try the cloud address?

Btibert3 commented 4 years ago

Awesome, worked perfectly!