opencypher / morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Apache License 2.0
336 stars 62 forks source link

Question about development on this project #961

Open hidders opened 3 months ago

hidders commented 3 months ago

Hello everybody,

I am a researcher working on GQL and graph querying, both theoretical and practical issues. We have currently some funding to start a mini-project to see if we can make some progress in implementing GQL in Morpheus. However, we would probably first try and get Morpheus up and running again. For that it would be good to know if the current code base is actually still working or not, and/or there have been some forks on which development is still ongoing and that are working. Could any of you perhaps shed some light on this?

Kind regards,

-- Jan Hidders, Birkbeck College, University of London

Mats-SX commented 3 months ago

Hello @hidders

This project is not currently maintained. I am not aware of any forks where the work has continued.

The Spark version used here is somewhere around 3.0, which is pretty old by now. Most of the functionality was working well as implemented when development stopped. It covers a good portion of openCypher 9 (reading clauses only), and experimental extensions to manage graphs. The README is a useful resource.

We (Neo4j) do not have any current plans to extend this project to GQL.

Regards Mats

hidders commented 3 months ago

Thank you, Mats. That's good to know and helpful. We will let you know how far we get.

drew-moore commented 3 months ago

The Spark version used here is somewhere around 3.0, which is pretty old by now.

Spark 3.0 hadn't quite been released when development here stopped -- head of master is actually on 2.4.3.

@hidders I went a little way down this road myself, once upon a time: I had a fork running with 3.0.x, and I know at least one other community member did too (see convo here), but unfortunately neither of our forks were ever pushed. IIRC, the update from 2.4.x -> 3.0.x was a bit hairy but not terribly so, and I think you can feel reasonable hoping that incremental updates from there to 3.5.x would be doable as well.

Feel free to ping me if I can help (me at drewmoo dot re): The way this project evaporated so suddenly has always perplexed me, and I never quite let go of the dream of GQL in spark :)

drew

alastai commented 3 months ago

The project evaporated for three reasons:

1) It had played its role in motivating and winning support for the GQL standard initiative, which in 2018-2019 became a higher priority for some of us

2) Neo4j prioritized the Graph Data Science library in its product plans (a perfectly rational cost/benefit decision)

3) Databricks backed away from incorporating Cypher in Spark 3.0. (The SPIP to do that projected that Cypher support would ultimately morph into GQL support.)

Five years later ... it's great to see this work showing signs of emerging from its Sleepy Hollow.

Its hierarchical catalog concept is part of GQL; its Graph DDL concept (slightly reduced in scope) is part of GQL; its experiments with graph-composable queries are as relevant as ever, the OKAPI layering makes it highly capable of operating in the Spark and other possible worlds.

The ability to map SQL and other tabular data sources into a graph-schema defined view is very interesting from a data integration/data lake perspective.

That was "the other design" for SQL-graph integration, the one that didn't end up in SQL/PGQ ... but with GQL's graph types now in existence, that thread is waiting to be picked up. For those with imagination the fact that a table is a graph with one node type and no edge types produces some interesting "exercises for the reader".

An ideal project for future research!

Alastair Green On 12 Jun 2024 at 03:17 +0100, drew moore @.***>, wrote:

The Spark version used here is somewhere around 3.0, which is pretty old by now. Spark 3.0 hadn't quite been released when development here stopped -- head of master is actually on 2.4.3. @hidders I went a little way down this road myself, once upon a time: I had a fork running with 3.0.x, and I know at least one other community member did too (see convo here), but unfortunately neither of our forks were ever pushed. IIRC, the update from 2.4.x -> 3.0.x was a bit hairy but not terribly so (I did it in a weekend), and I think you can feel reasonable hoping that incremental updates from there to 3.5.x would be doable as well. Feel free to ping me if I can help (me at drewmoo dot re): The way this project evaporated so suddenly has always perplexed me, and I never quite let go of the dream of GQL in spark :) drew — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

hidders commented 3 months ago

@drew-moore That is very encouraging to hear! Let me discuss this with our developer and we (probably him) will get back to you on this.