Closed tomasonjo closed 1 year ago
Hey @tomasonjo , thanks for reporting the issue! checking in our code, it is indeed not supported at the moment to convert arrays. You can see at https://github.com/neo4j/graph-data-science/blob/2af46276d44c6b6daf88101ba78e559dbf123fcf/core/src/main/java/org/neo4j/gds/utils/Neo4jValueConversion.java#L56 what auto conversions we support. We even have a test case where we expect this.
I like the idea of supporting this conversion though, if every float is safe to cast to a long.
It seems it's got something to do with Neo4j internals? Most of the vectors should be LongArray to begin with. If I rerun the vector generating statement I get:
Invalid input for function 'toInteger()': Expected a String, Number or Boolean, got: LongArray[4, 2, 5, 4, 1, 1, 4, 1, 1, 3, 4, 5, 1, 1, 1, 2, 2, 3, 5, 2, 5, 2, 5, 1, 2, 2, 3, 3, 1, 3, 3, 1, 5, 4, 3, 5, 3, 1, 5, 2, 5, 3, 3, 1, 1, 5, 3, 5, 5, 3, 1, 2, 4, 1, 4, 1, 4, 1, 1, 1, 4, 1, 3, 5, 3, 4, 3, 1, 1, 1, 3, 1, 3, 1, 3, 1, 3, 4, 3, 3, 4, 1, 4, 1, 5, 2, 5, 5, 1, 3, 3, 3, 1, 4, 1, 3, 5, 3, 4, 4, 4, 1, 4, 3, 1, 5, 3, 1, 3, 3, 1, 1, 1, 1, 2, 2, 2, 5, 3, 1, 1, 4, 2, 5]
It seems there are some rogue DoubleArray properties due to:
CALL apoc.create.setProperty(p1, key, round(averageValue)) YIELD node
explicitly casting the output to integer solves the issue:
CALL apoc.create.setProperty(p1, key, toInteger(round(averageValue))) YIELD node
Do you think it is worthwhile to report this to Neo4j repo as you would imagine that the round() would output a consistent data format.
MATCH (p:Person)
UNWIND keys(p) AS key
WITH p, key
WHERE toInteger(p[key]) IS NOT NULL
RETURN apoc.meta.cypher.type(p[key]), count(*)
type | count |
---|---|
"INTEGER" | 139819 |
"FLOAT" | 571 |
I was not aware that round
can produce double values in Neo4j.
Would be interesting to know what the underlying logic is.
Let me get someone to give us some background first :)
According to https://neo4j.com/docs/cypher-manual/current/functions/mathematical-numeric/#functions-round, round
does return a float. But this does not explain why its not consistent
Ok I get it... the round()
always produces doubles as there are only 571 missing values. So that's it, we have some rogue double arrays in between, but mostly long arrays, and therefore there is a problem. The joys of schema-less database :)
You can close this issue if you wish
Thanks for finding out
Will do once we merged our PR around adding more automatic conversion support :)
Just merged an improvement for the automatic conversion of array properties, which will be part of 2.4
Hi. As discussed, please my use case which generates the same error. Sorry I didn't made it simpler but it's quite quick to setup. Just load the 3 csv files using "medical_cypher.txt" and then run "gds_issue.txt" to have the last command showing the issue. Seems it's coming from the nodes labeled Drug as it works once removed. I checked the type of the property and it say LIST OF FLOAT. Let me know if I did something wrong... EE V5.5, GDS v2.3.1 medical.zip
(as discussed) The gds.fastRP.write will write back float[] but the Drug and Diag nodes will have only double[] values as the Cypher toFloatList creates doubles (as Cypher only uses doubles).
I would suggest to only load the fastRP_Embed for the Patient nodes.
This issue is already solved, but for any potential future reader, I want to offer this reply:
I've reproduced the error. I notice that you do not write any FastRP embeddings for the :Drug
nodes. That's explicitly the effect of the nodeLabels
filter that you configure in the fastrp.write
query:
CALL gds.fastRP.write($graphname,{
embeddingDimension: 6,
nodeLabels: ['Patient'], // <--- here we only use Patient nodes
randomSeed: 42,
propertyRatio: 1.0,
nodeSelfInfluence: 1.0,
featureProperties: ['Diab', 'Dyslipidaemia', 'Osteo', 'Hypertension'],
writeProperty: 'fastRP__Embed'
});
Then, in your projection query, you use the syntactic sugar short-form to specify node properties to project. Properties specified in this form apply to all projected node labels. Since you want to project the fastRP__Embed
property that you wrote to Patient
nodes you specify that property. But it does not exist on any of the Drug
nodes.
As a result, the node projection is ill-formed, I would say. You will get a fastRP__Embed
property for Drug
nodes, but it will be all zero-vectors. Not very useful.
The quickest fix, in my opinion, is to separate the node property configuration and make these label-specific. Here is an example:
WITH {
Patient: { properties: ['targetProperty', 'isTrain', 'fastRP__Embed'] },
Drug: { properties: { targetProperty: {}, isTrain: {} } }
} AS nodeProjection
CALL gds.graph.project($graphname2, nodeProjection, 'PRESCRIBED')
YIELD graphName, nodeProjection, nodeCount AS nodes, relationshipCount AS rels
RETURN graphName, nodeProjection AS patientProjection, nodes, rels;
GDS version: 2.3.0 Neo4j version: 5.4.0 Operating system: Sandbox
Tried to reproduce on a smaller dataset, but it worked there, so I am attaching the flow that reproduces the error:
The error is:
The issue can safely be avoided by casting elements of the vector as a float: