neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.69k stars 495 forks source link

APOC NLP with apoc.periodic.iterate bug #2998

Closed tomasonjo closed 1 year ago

tomasonjo commented 2 years ago

Guidelines

I have run into a weird bug that was noticed before on community site: https://community.neo4j.com/t5/neo4j-graph-platform/use-of-apoc-periodic-iterate-with-apoc-nlp-gcp-classify-graph/m-p/56776

Without the apoc iterate procedure it works fine, which is really weird.

To reproduce the issue:

Create data:

UNWIND ["Actress Anushka Sharma in a recent interview said that she and her husband, Team India captain Virat Kohli, hardly spend time together as they both have very hectic lives. Virat and I have been working around the clock...we are living in a house and we've spent barely any time in it. For us, home is...wow! It's a vacation, she added.", "Reliance Industries Chairman Mukesh Ambani's daughter Isha Ambani and Priyanka Chopra's cousin Parineeti Chopra were among the celebrities who were part of Priyanka's bachelorette. Sharing a picture of herself with the girls who attended her bachelorette, Priyanka wrote, Red, white and Bride!!! The picture also shows Priyanka's future sister-in-law Sophie Turner and Sonali Bendre's sister-in-law Srishti Behl Arya among others.", "Singer-composer AR Rahman, who converted to Islam from Hinduism, has revealed people often ask him if Islam has made him successful. It's not about converting or not. It's whether you find the spot, whether it presses that button in you, he added. Rahman further said his faith has kept him on the right track and saved him from many falls.", "A consortium of lenders led by Bank of Maharashtra has taken 'symbolic possession' of Pune's Maharashtra Cricket Association Stadium over non-payment of loan by stadium authorities. The Maharashtra Cricket Association has failed to repay dues of ₹69.53 crore to Bank of Maharashtra, Karnataka Bank, Bank of Baroda and Andhra Bank despite being served a 60-day notice on August 18.", "Team India opener Rohit Sharma on Tuesday broke his own world record for hitting most sixes across all formats in international cricket in a calendar year. The 31-year-old smashed seven sixes in the second T20I against Windies, taking his tally of sixes in 2018 to 69. He had set the previous world record by slamming 65 sixes in 2017.", "After renaming Faizabad district as Ayodhya, Uttar Pradesh Chief Minister Yogi Adityanath has announced that his government will construct an airport in Ayodhya, which will be named after Lord Rama. He added that a medical college will also be established in Ayodhya and it will be named after King Dasharatha, who is Lord Rama's father in epic 'Ramayana'.", "The Finance Ministry wants the RBI to transfer a surplus of ₹3.6 lakh crore, over a third of the total ₹9.59 lakh crore reserves of the central bank, to the government, reports said. The Ministry suggested that the surplus can be managed jointly by both the RBI and the government. The RBI has not accepted the proposed changes, reports added.", "A 94-year-old wheelchair-bound man accused of helping to murder hundreds of people at a Nazi concentration camp during World War Two appeared in a German court on Tuesday. He is being tried in a youth court because he was under 21 at the time of the suspected crimes at the Stutthof camp where about 65,000 people died.", "A fisherman rescued an 18-month-old baby from the ocean after he wandered into the sea while his parents were asleep during a camping trip in New Zealand. I...thought it was just a doll. His face looked just like porcelain with his short hair wetted down, the fisherman said. He was bloody lucky...he just wasn't meant to go. he added.", "After Pakistan's state-run news channel PTV wrote Beijing as begging on screen during the live broadcast of PM Imran Khan's speech in China, the government has removed the channel's MD. The channel had earlier apologised for the error. The Pakistan leader was in China to secure an economic package amid the debt crisis in the country."] AS text
CREATE (n:Node {text:text})

And then run the classification procedure:

CALL apoc.periodic.iterate("
   MATCH (node:Node) RETURN node",
   "
   CALL apoc.nlp.gcp.classify.graph(node, {
       // we retrieve gcp api key from static value storage
       key: 'key',
       // node property that contains the text
       nodeProperty: 'text',
       write:true
    }) YIELD graph RETURN distinct 'done'", {batchSize:5})

This returns a weird error:

{ "Node[10728] is deleted and cannot be used to create a relationship": 1, "Node[10729] is deleted and cannot be used to create a relationship": 1 }

The graph is static, so I am not importing or deleting any data. That makes it weird why some nodes do not exist. If I try to match the nodes in the error

MATCH (n) WHERE id(n) IN [10728, 10729]
RETURN n

I get no results. However as soon as I remove the apoc.periodic.iterate it starts to work:

   MATCH (node:Node)
   CALL apoc.nlp.gcp.classify.graph(node, {
       // we retrieve gcp api key from static value storage
       key: 'key',
       // node property that contains the text
       nodeProperty: 'text',
       write:true
    }) YIELD graph RETURN distinct 'done'

Versions

vga91 commented 1 year ago

Related to https://github.com/neo4j-contrib/neo4j-apoc-procedures/pull/3098

vga91 commented 1 year ago

Fixed here: https://github.com/neo4j-contrib/neo4j-apoc-procedures/pull/3098/files