neovimhaskell / nvim-hs

Neovim API for Haskell plugins as well as the plugin provider
Other
267 stars 18 forks source link

function type efficiency #76

Closed goolord closed 5 years ago

goolord commented 5 years ago

consider the following generated function

nvim_buf_get_lines' :: Buffer -> Int64 -> Int64 -> Bool -> forall env. Neovim env [String] 

If I'm processing a large amount of text (say, a file with ~3,000 lines), [String] is just not going to be sufficient performance wise.

I would much prefer if there were at least an alternate version of the function that returns something like Neovim env (Vector ByteString)

goolord commented 5 years ago

also: I realize that the machinery in TH.hs is available to do this, I'm advocating for the default types to be ByteString and Vector respectively, and also for ObjectArray to be backed by Vector instead of []

saep commented 5 years ago

I think that ByteString as the default is problematic because you are usually editing text with an encoding and not raw bytes, so Text would be a more sensible default choice. However, if I'm adding a different text and list type (or accept a pull request for it ;-) ), I would rather remove a default altogether and anyone using this library just has to import the wanted implementation explicitly.

import Neovim
import Neovim.API.Text

It's been a while since I have implemented the code generation and I had planned to play around with backpack to see if that was the way to go rather than providing 3 (or more) modules. But as you can see, I didn't do it. :-)

I haven't measured it and I can definitely be wrong, but 3k lines is so little data that you may not even notice the difference between [String] and Vector ByteString. The used messagepack library uses [ByteString], so converting that to a Vector ByteString may actually be slower than [ByteString].

I think that generating the api code for [String], Vector Text and Vector ByteString may be the best interface in the long run. I would definitely merge a pull request for those, but unless someone shows me a benchmark with a significant performance difference on realistically large text files, I rather do something else in my spare time.

I did actually plan to finish the documentation for the 2.0 release this weekend and it would be a good time to not export the String-based API by default anymore. And since I'm breaking things and almost all neovim function calls may throw an exception, I'm thinking about removing the functions with a prime and use that implementation by default and hence only generate half the function I'm generating now.

goolord commented 5 years ago

The difference in performance is noticeable enough between String and Text for me to want this, especially since the workload is lexing haskell source code.

The only reason why I would prefer ByteString is that (I believe) it's unclear what the underling encoding is, but I don't think it matters much.

I find it strange that the Array type is not backed by an Array, and the current version of the msgpack library has ObjectArray !(Vector Object) https://hackage.haskell.org/package/msgpack-1.0.0/docs/Data-MessagePack-Object.html#v:ObjectArray

goolord commented 5 years ago

Oh, no, according to the most recent version of the msgpack library it will always be UTF8 encoded https://hackage.haskell.org/package/msgpack-1.0.0/docs/Data-MessagePack-Object.html#v:ObjectStr so I think Text would indeed be a better fit

saep commented 5 years ago

nvim-hs uses a different messagepack library https://hackage.haskell.org/package/messagepack-0.5.4/docs/Data-MessagePack.html. I can`t remember exactly why I had chosen that library then. It might be that conduit supported it back then and I wanted to learn that.

saep commented 5 years ago

I made a PR with those mentioned APIs. If you're eager to try it, you can checkout the branch. I'll probably create a release within the next week with these breaking changes.

saep commented 5 years ago

The current 2.0.0.0 release contains these API variants.