Closed gelisam closed 11 years ago
An alternative choice for ByteString -> r
would be to accept the entire input, without splitting it into lines. But I think modifying each line is going to be more common, so I would prefer to optimize for that use case.
Whichever choice we make, users can easily add unlines
or fmap
to get the other behaviour if needed.
The same choice exists for JSON, but I think the opposite use case is going to be more common: a JSON input spanning multiple lines. If users want to interpret the input as one JSON per line, they can write a function of type [Aeson.Object] -> r
.
Yep! I think this is an amazing idea. I actually considered it for detecting functions of type ByteString -> r
but it didn't occur to me to take it any further. My favourite part:
[(String, Int, Int)] -> r
? Parse the input as tab-separated fields.
This * 100000. I considered trying to wrap ByteString's readInt
to be a bit more useful, but it seems clear now that a type-declerative solution is the way to go.
I'm actually not a fan of trying to be too clever with the input though; currently the function input is a list of one item per line, and I'd like to keep it that way. The reason is that I believe the most common usecases are always going to be [a] -> [b]
and [a] -> b
, ie, map / filter / reduce. If I write a function that expects a single input it's more likely that I forgot to use map
than I actually wanted the whole file unsplit.
I also think that the input interface should be consistent, and ideally, composable; commonly a stream of data from Hadoop would be: key\tjson
, so it would be nice to be able to turn that into (key,a,b)
. Perhaps lenses will help out here?
We could have a function with a similar api as the json
function, ie using a witness which splits on tabs and produces a series of tuples (ie, a -> [b] -> [a]
). Given that, if there's some nice way of making it work by introspecting the input type signature, then that could be provided as an additional luxury on top. What do you think?
I agree that map, filter and reduce are going to be the most common use cases. I'm not suggesting to unsplit the file and give the raw input stream to the function; on the contrary, I'm suggesting to silently add this map
the user forgot.
The only situation in which I suggest to unsplit the input is when the input type is Aeson.Object, indicating that the entire file is a single JSON object. I'm actually quite surprised that there are formats which stuff their entire JSON on a single line; that must be so horribly unreadable! Anyway, the input (String, Aeson.Object) would still be parsed as one pair per line, so both of our use cases would be covered. I'm not familiar with lenses, and I don't think I need them to implement the behaviour I have described.
Speaking of things I am not familiar with: I don't understand your json
trick well enough to generalize it, so I plan to stick with typeclass magic for now.
Speaking of things I am not familiar with: I don't understand your json trick well enough to generalize it, so I plan to stick with typeclass magic for now.
It's not actually my trick... Basically, the witness can have any type that corresponds to a FromJSON instance. Then, Aeson does all the work when you call fromJSON
.
I would have made the whole thing totally implicit, however, the numeric functions (sum, mod, log etc) tend to support numeric typeclasses so you have to specify a concrete type anyway.
I agree that map, filter and reduce are going to be the most common use cases. I'm not suggesting to unsplit the file and give the raw input stream to the function; on the contrary, I'm suggesting to silently add this 'map' the user forgot.
Ah, +1 then!
I would have made the whole thing totally implicit, however, the numeric functions (sum, mod, log etc) tend to support numeric typeclasses so you have to specify a concrete type anyway.
Have you considered default instances? Not that I've ever used them.
Interesting. So this may work.
Prelude> default (Float)
Prelude> let a = sum [1.1,2,2]
Prelude> :t a
a :: Float
Prelude> default (Double)
Prelude> let a = sum [1.1,2,2]
Prelude> :t a
a :: Double
I have pushed the above commit to my own repo rather than yours, because it breaks almost all the examples in the README. Since the input and output types can now be anything, GHC has a really hard time inferring the type of the user's code. I don't think this can be avoided...
> printf '1\t2' | hsl 'map swap :: [(Int, Int)] -> [(Int, Int)]'
2 1
Unless you can save us using some type witness magic?
The witnesses are really simple, they just concretise the types in the eyes of the compiler. They have no logic or purpose at runtime at all. An easy way to use them:
witness :: a -> a -> a
witness _ v = v
Then in your expression:
map swap . witness [(i,i)]
Then the compiler knows exactly what it's supposed to be dealing with. Could remove them from json
as such:
witness [(i,i)] . json ""
I'm running this and it doesn't work even for simple examples:
echo '1 a' | hsl length
/var/folders/9h/w42hpwwj5j3bqx8cymz3jx7w0000gn/T/tmpcjBHCe/Main.hs:33:8:
No instance for (ParsableFromLine a0) arising from a use of `run'
The type variable `a0' is ambiguous
Could you explain a bit about the design goal of the new classes in Types.hs?
Also, I don't really want to use Read
. Reason being, sadly, it's a major performance killer... I don't know why this is but, it is. ByteString's intRead
is nice and fast, though.
Edit: I should mention that before changing any of the typeclasses in Types.hs, I'd like for this to be implemented as a simple function in a separate module.
You can give ParsableFromItem Int
a concrete implementation if you want, like I did for String
and ByteString
. I didn't know Read
was so slow; I still can't reason properly about Haskell's time- or space- performance.
Yes, this is exactly the problem I was referring to: my change is no good, because even the simplest expression fails type inference now. The reason is that length
has the polymorphic type [a] -> Int
, but the compiler cannot infer what the type of a
should be.
When you only had the Renderable
constraint, the compiler was unifying [a] -> Int
with the context Renderable r => [ByteString] -> r
, for which GHC could easily conclude that a = ByteString
and r = Int
.
The new design goal is to allow the user's function to have type [(String, Int, Int)] -> Int
, or [(Int, Int)] -> String
, or whatever tuple type the user wants to manipulate. To do this, we need to add a second constraint, Parsable a
, with instances instance Parsable (String, Int, Int)
, instance (Int, Int)
, and so on. But now, when the user writes a polymorphic function of type [a] -> Int
, GHC needs to unify that type with (Parsable a, Renderable r) => [a] -> r
, and GHC has no way to know whether a = (String, Int, Int)
or a = (Int, Int)
. Of course, for the function length
the result with be the same regardless of our choice, but Haskell still needs a concrete type in order to proceed.
To alleviate the problem, I overwrote GHC's type-inference (this is simpler than it sounds...) to assume ByteString whenever it can, and that helped a lot. I plan to improve the inference further, by forcing [a]
for Functor a
, forcing Int
for Num a
, and by removing -defer-type-errors
and run
during type-inference.
But not today! I have already spent way too much time on this project this weekend. It was fun, though, thanks for starting this interesting project!
Edit: I should mention that before changing any of the typeclasses in Types.hs, I'd like for this to be implemented as a simple function in a separate module.
I don't see how that would be possible. The whole point is for the behaviour to be different depending on the type, and for this, you need typeclasses!
See my branch feature/tabular.. You're right, it requires changes in the types, but not any additions. Although, it's still using the same API, ie explicit.
Since type-inference magic will need to be reimplemented anyway, I think we should close this issue and replace it with a family of smaller tasks.
Thanks giving me commit access!
Since I won't have to send you pull requests, I plan to describe the features I want to implement as issues, so we can discuss them beforehand if needed. If my suggestions deviate too much from your vision, feel free to object!
Anyway, this particular issue is about revamping the smart render system so that in addition to allowing different types for output, we can also accept different types of input. Examples:
ByteString -> r
? Apply the function to each line.[ByteString] -> r
? Apply the function to the entire input.[(String, Int, Int)] -> r
? Parse the input as tab-separated fields.Aeson.Object -> r
? Parse the entire input as json, then pass it to the function.