neo4j / graph-data-science-client

A Python client for the Neo4j Graph Data Science (GDS) library
https://neo4j.com/product/graph-data-science/
Apache License 2.0
194 stars 46 forks source link

Optional Logging for Graph Data Science Project. #754

Closed angelosantos4 closed 2 days ago

angelosantos4 commented 1 month ago

Is your feature request related to a problem? Please describe.

I am configuring logging for my automated graph data science project, and I noticed that there is no way to disable to the tqdm output that comes from the graph database runner. This has caused problems related to conflicting context when implementing my own Json Logger. I have located the logic that would allow this to be configurable to the end user here.

Describe the solution you would like

I would like for a parameter to be added to GraphProjectRunner.call for the logging variable, and for it to be propagated down to the query runner.

Describe alternatives you have considered

Since this logging variable is hard coded to be True when passed into the query runner, I am unable to get past this without manually modifying the library. I could directly create a Neo4jQueryRunner class and run the call_procedure with all the parameters necessary including logging=False, but I believe this is a simple enough change to propagate to a higher level.

Additional context

Suggested Change:

graphdatascience/graph/graph_project_runner.py

class GraphProjectRunner(IllegalAttrChecker):
    def __call__(self, graph_name: str, node_spec: Any, relationship_spec: Any, **config: Any) -> GraphCreateResult:
        params = CallParameters(
            graph_name=graph_name,
            node_spec=node_spec,
            relationship_spec=relationship_spec,
            config=config,
        )
        result = self._query_runner.call_procedure(
            endpoint=self._namespace,
            params=params,
            logging=True,
        ).squeeze()

        return GraphCreateResult(Graph(graph_name, self._query_runner), result)

Into

class GraphProjectRunner(IllegalAttrChecker):
    def __call__(self, graph_name: str, node_spec: Any, relationship_spec: Any, logging=True, **config: Any) -> GraphCreateResult:
        params = CallParameters(
            graph_name=graph_name,
            node_spec=node_spec,
            relationship_spec=relationship_spec,
            config=config,
        )
        result = self._query_runner.call_procedure(
            endpoint=self._namespace,
            params=params,
            logging=logging,
        ).squeeze()

        return GraphCreateResult(Graph(graph_name, self._query_runner), result)
soerenreichardt commented 1 month ago

Hello @angelosantos4 , thanks for contributing. I think the request does make sense, but we'd like to have a more general solution that is also working for all the other runners. I can try to make the change I have in mind and link it here, so you can say if that meets your requirements then. Cheers

angelosantos4 commented 1 month ago

That sounds perfect, thank you for looking into this.

soerenreichardt commented 2 weeks ago

Sorry for not replying here again, but did you see the attached pull request? There is now a flag on the GraphDataScience object that can disable all progress logging: https://github.com/neo4j/graph-data-science-client/blob/main/graphdatascience/graph_data_science.py#L39 I hope this is sufficient for your usecase