ocramz / heidi

heidi : tidy data in Haskell
Other
27 stars 3 forks source link

tidy data operations with lists as keys #13

Open ocramz opened 3 years ago

ocramz commented 3 years ago

There are two internal details of the library that must be reconciled and the UX must be figured out. How to manipulate list-valued indexing keys, with little boilerplate?

encode :: (Foldable t, Heidi a) => t a -> Frame (Row [TC] VP)

https://hackage.haskell.org/package/heidi-0.0.0/docs/Heidi-Data-Frame-Algorithms-GenericTrie.html

innerJoin :: (Foldable t, Ord v, TrieKey k, Eq v, Eq k) => k  -> k  -> t (Row k v)  -> t (Row k v)  -> Frame (Row k v)
adamConnerSax commented 3 years ago

I think it would be useful to have some combinators to build the bits to use gather, spread, join, etc. from simpler things, especially in the Row [TC] VP case. It should be easy, given a known column, to construct the [TC] key required for the various operations as well as the Set [TC] required for gather (and a more general join? I find multi-key-column join to be pretty useful...).

For example:

gatherSet :: (Functor f, Foldable f) => [Heidi.TC] -> f Text -> Set [Heidi.TC]
gatherSet prefixTC = Set.fromList . Foldable.toList . fmap (\t -> reverse $ Heidi.mkTyN (toString t) : prefixTC)

but this assumes all the TC are only representing "Types". Maybe there should be a TC -> Text function for use in various places where this comes up? Also--sorry, OT--why does TC use String rather than Text?

gatherWith requires k -> v. I think there ought to be some reasonable default implementation to cover various cases. I wrote:

tcKeyToTextValue :: [Heidi.TC] -> Heidi.VP
tcKeyToTextValue tcs = Heidi.VPText $ Text.intercalate "_" $ fmap tcAsText tcs where
  tcAsText tc = let n = Heidi.tcTyN tc in toText $ if null n then Heidi.tcTyCon tc else n

But that raises all the same questions about TC.

I confess I haven't tried these things yet, so some may be more obvious/simpler than they seem. But that's part of the point, maybe. This would all be more approachable for beginners if the powerful stuff was easy to use in the typical cases. Using [TC] as the usual key is confusing (maybe hide behind a newtype?) as is bridging the highly polymorphic functions with the usual case.

I do like that they are so polymorphic! I can imagine, once comfortable with the library, using a different set of keys and values for ease of interoperation with, e.g., Frames or hvega or whatever, but still wanting the set and tidy operations available.