snowflakedb / snowpark-java-scala

Snowflake Snowpark Java & Scala API
Apache License 2.0
18 stars 20 forks source link

SNOW-828775: Get Scala session and Dataframe from Java #34

Open sonalgoyal opened 1 year ago

sonalgoyal commented 1 year ago

Our code relies on java(our own code) and scala libraries(ml, graph) and it would be very helpful to be able to convert the Java Dataframe and Session to Scala so that we can use both interoperably. I see the com.snowflake.snowpark_java.Dataframe already has getScalaDataframe() but it is package scoped. So is com.snowflake.snowpark_java.getScalaSession.

Is it possible to expose these methods publicly?

sfc-gh-jfreeberg commented 1 year ago

Hi @sonalgoyal , could you include a code snippet or psudeo-code to help describe your scenario? I'm not sure I follow. Since Java is the common denominator in your project, you should be able to use the Java Snowpark library throughout, no?

sonalgoyal commented 1 year ago

@sfc-gh-jfreeberg We use Java predominantly in our stack and transform the data. So we have a Java Dataframe. Now we have a graph library provided to us by the Snowflake team which is in scala and uses Scala Dataframes as input. We do not have a way to invoke the Scala library from Java, as the DFs can not be invoked directly. Hence we are writing the Java DF to a temp table, and then reading it in Scala to make Scala Dataframes.

If we could get a handle to the underlying scala dataframe from java, we could pass that to the graph library, and convert the resulting scala df back to java df and use it in our flow.

sfc-gh-mrui commented 1 year ago

@sonalgoyal Thanks to explain the use case for us. If these API are package scoped. I am assuming you can workaround it easily by introducing a utility class in the same name package and create a public function to return Scala DataFrame/Session for Java DataFrame/Session.

sonalgoyal commented 1 year ago

@sfc-gh-mrui thanks for your suggestion. this workaround is not optimal as we do not want to write code in a namespace we do not own, and if the snowpark code changes, our codebase gets impacted. hence the request for a public api.

sfc-gh-jfreeberg commented 1 year ago

@sonalgoyal Which graph library is this? Do said it was provided by Snowflake?

sonalgoyal commented 1 year ago

We got it from Stuart Ozer and Robert Fehrmann from the Snowflake team. @sfc-gh-jfreeberg