tweag / sparkle

Haskell on Apache Spark.
BSD 3-Clause "New" or "Revised" License
447 stars 30 forks source link

How to make a Haskell type (record) available in Spark? #148

Closed LucianU closed 3 years ago

LucianU commented 5 years ago

I have a type defined as

data Session = Session { _appName :: Text, _bank :: Text }

and many other fields.

When using sparkle, I'm trying to decode the contents of a file to the type above. The code I use goes like this:

rdd' <- textFile sc s3FileUri
decodedSessions <- RDD.map (closure (static (decodeStrict . encodeUtf8))) rdd'

For that to work, it seems I need to make my Session type available in Java land. How can I do that?

facundominguez commented 5 years ago

Hello, RDD.map has the following type:

map
  :: (Static (Reify a), Static (Reflect b), Typeable a, Typeable b)
  => Closure (a -> b)
  -> RDD a
  -> IO (RDD b)

Where b is instantiated to Session. Therefore, it requires an instance of Static (Reflect Session) which requires an instance of Reflect Session.

The utility of all this though, depends on what the input file format is and what you are going to compute with the resulting RDD.

LucianU commented 3 years ago

Unfortunately, I don't remember how I solved this, but since there is no general answer to the question, I think we can close the issue.