unisonweb / base

Unison base libraries
https://share.unison-lang.org/@unison/base
18 stars 6 forks source link

Streaming UTF-8 decoding and misc helpers #173

Closed ceedubs closed 1 year ago

ceedubs commented 1 year ago

Summary

This adds support for streaming UTF-8 decoding, where the incoming byte chunks might be split in the middle of a character. It also adds a bunch of miscellaneous helpers that I created to support this functionality and its tests.

Highlights

Stream index functions

Splitting input into {Random} chunks

{Random} nats adding to a specific sum

Bitwise operators

Decoding UTF-8 streams

All added definitions

Here's what would change in base/main after the merge:

Added definitions:

  1.  abilities.Random.splits.bytes          : Nat -> Bytes -> '{Random, Stream Bytes} ()
  2.  abilities.Random.splits.bytes.doc      : Doc
  3.  data.Stream.indexed.doc                : Doc
  4.  data.Stream.indexed!.doc               : Doc
  5.  Nat.leadingOnes.doc                    : Doc
  6.  abilities.Random.splits.list.doc       : Doc
  7.  abilities.Random.nat.natsWithSum.doc   : Doc
  8.  Text.fromUtf8.partial.doc              : Doc
  9.  abilities.Random.splits.doc            : Doc
  10. Text.fromUtf8.stream.doc               : Doc
  11. Text.fromUtf8.stream!.doc              : Doc
  12. abilities.Random.splits.text.doc       : Doc
  13. data.Stream.indexed                    : '{g, Stream a} r -> '{g, Stream (a, Nat)} r
  14. data.Stream.indexed!                   : '{g, Stream a} r ->{g, Stream (a, Nat)} r
  15. Text.fromUtf8.stream.tests.invalidUtf8 : [test.Result] (+1 metadata)
  16. Nat.leadingOnes                        : Nat -> Nat
  17. abilities.Random.splits.list           : Nat -> [a] -> '{Random, Stream [a]} ()
  18. abilities.Random.nat.natsWithSum       : Nat -> Nat ->{Random} [Nat]
  19. Text.fromUtf8.partial                  : Bytes ->{Exception} (Text, Bytes)
  20. abilities.Random.splits                : (a ->{g1} Nat)
                                             -> (Nat -> a ->{g2} (a, a))
                                             -> Nat
                                             -> a
                                             -> '{g1, g2, Random, Stream a} ()
  21. Text.fromUtf8.stream                   : '{g, Stream Bytes} r
                                             -> '{g, Exception, Stream Text} r
  22. Text.fromUtf8.stream!                  : '{g, Stream Bytes} r
                                             ->{g, Exception, Stream Text} r
  23. Text.fromUtf8.stream.tests.success     : [test.Result] (+1 metadata)
  24. abilities.Random.splits.bytes.tests    : [test.Result] (+1 metadata)
  25. Nat.leadingOnes.tests                  : [test.Result] (+1 metadata)
  26. abilities.Random.splits.list.tests     : [test.Result] (+1 metadata)
  27. abilities.Random.nat.natsWithSum.tests : [test.Result] (+1 metadata)
  28. Text.fromUtf8.partial.tests            : [test.Result] (+1 metadata)
  29. abilities.Random.splits.text           : Nat -> Text -> '{Random, Stream Text} ()
pchiusano commented 1 year ago

Nice, and love the docs!

https://share.unison-lang.org/@unison/base/code/@ceedubs/utf8-streams-and-friends/latest/terms/abilities.Random.splits.bytes

and

https://share.unison-lang.org/@unison/base/code/@ceedubs/utf8-streams-and-friends/latest/terms/abilities.Random.splits.text

examples look like they have an extra level of thunking, other than that lgtm.

ceedubs commented 1 year ago

Ah good catch. I'll fix the examples next time I'm on my computer.

ceedubs commented 1 year ago

Okay examples should be fixed now.

runarorama commented 1 year ago

This is now in /main. Thanks! 🌈⭐