neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

gds.beta.pipeline.nodeClassification.predict.stream yields java.lang.ArrayIndexOutOfBoundsException for MultilayerPerceptron model #265

Closed devineyfajr closed 1 year ago

devineyfajr commented 1 year ago

gds.beta.pipeline.nodeClassification.predict.stream yields results for RandomForest and LogisticRegression models applied to same graph:

Console message: Failed to invoke procedure gds.beta.pipeline.nodeClassification.predict.stream: Caused by: java.lang.ArrayIndexOutOfBoundsException

Log message: 2023-04-03 12:38:30.398+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Start 2023-04-03 12:38:30.398+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Execute node property steps :: Start 2023-04-03 12:38:30.398+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Execute node property steps :: Finished 2023-04-03 12:38:30.398+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Node classification predict :: Start 2023-04-03 12:38:32.116+0000 INFO [o.n.k.a.p.GlobalProcedures] [gds-4] Node Classification Predict Pipeline :: Node classification predict 25% 2023-04-03 12:38:32.126+0000 INFO [o.n.k.a.p.GlobalProcedures] [gds-2] Node Classification Predict Pipeline :: Node classification predict 49% 2023-04-03 12:38:32.127+0000 INFO [o.n.k.a.p.GlobalProcedures] [gds-1] Node Classification Predict Pipeline :: Node classification predict 74% 2023-04-03 12:38:32.130+0000 INFO [o.n.k.a.p.GlobalProcedures] [gds-3] Node Classification Predict Pipeline :: Node classification predict 100% 2023-04-03 12:38:32.130+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Node classification predict :: Finished 2023-04-03 12:38:32.130+0000 INFO [o.n.k.a.p.GlobalProcedures] [neo4j.BoltWorker-2 [bolt-95309 - /127.0.0.1:51554]] Node Classification Predict Pipeline :: Failed 2023-04-03 12:38:32.130+0000 WARN [o.n.k.a.p.GlobalProcedures] Computation failed java.lang.ArrayIndexOutOfBoundsException: null

brs96 commented 1 year ago

Hi @devineyfajr, thanks for reporting this. Do you have an example graph that reproduces the issue?

devineyfajr commented 1 year ago

Hi @devineyfajr, thanks for reporting this. Do you have an example graph that reproduces the issue?

Unfortunately I don't.

breakanalysis commented 1 year ago

Hi @devineyfajr , was there more to the error in the console or in debug.log ? Also, can you please provide the pipeline configuration including the configuration of the MultilayerPerceptron if any?

devineyfajr commented 1 year ago

no more error info than that above.

CALL gds.beta.pipeline.nodeClassification.create("vt-MLP"); CALL gds.beta.pipeline.nodeClassification.selectFeatures("vt-MLP", ["degreeCentrality","embedding"]); CALL gds.beta.pipeline.nodeClassification.configureSplit("vt-MLP", {validationFolds: 10,testFraction: 0.25}); CALL gds.alpha.pipeline.nodeClassification.addMLP("vt-MLP", { hiddenLayerSizes: [256, 256], focusWeight: {range: [0.0, 1.0]}, batchSize: {range: [50, 200]}, minEpochs: {range: [1, 5]} }) YIELD parameterSpace ; CALL gds.alpha.pipeline.nodeClassification.configureAutoTuning('vt-MLP', {maxTrials: 8}); // create graph here CALL gds.beta.pipeline.nodeClassification.train('myGraph', { pipeline: 'vt-MLP', targetNodeLabels: ["__ALL__"], modelName: 'nc-MLP-model-frp', targetProperty: 'classId', metrics: ['F1_MACRO'] }) YIELD modelInfo, modelSelectionStats RETURN modelInfo, modelSelectionStats ;

brs96 commented 1 year ago

Hi @devineyfajr ,

Thanks for above. It's hard to pin down the exact problem without an example to reproduce. I do have one suggestion to try out.

You've specified targetNodeLabels: ["ALL"] in train. I think this means the your graph projection used "*" for nodeLabels. So all nodes have the same label (ALL), with some properties on them. For node classification, it is common that nodes can be labelled with at least 2 different labels. For example in https://neo4j.com/docs/graph-data-science/current/machine-learning/node-property-prediction/nodeclassification-pipelines/training/#nodeclassification-pipelines-examples-train-filtering, there are House which have a few nodeProperties, and UnknownHouse which also have the same nodeProperties, plus an extra class property.

I think you might want to try: Project your graph with different nodeLabels. In training, specify targetNodeLabels those that have classId as a property. (e.g House) In predict, specify targetNodeLabels those that don't have classId (e.g UnknownHouse)

brs96 commented 1 year ago

Hi @devineyfajr ,

We'll close the issue for now as it could not be reproduced. Do let us know if the suggestion above fixes your problem. If not, feel free to reopen or raise a new issue.

Thanks!