thi-ng / umbrella

⛱ Broadly scoped ecosystem & mono-repository of 199 TypeScript projects (and ~180 examples) for general purpose, functional, data driven development
https://thi.ng
Apache License 2.0
3.38k stars 150 forks source link

[transducers] Transducers crash when source is an empty string #186

Closed earthlyreason closed 4 years ago

earthlyreason commented 4 years ago

Most transducers take a source sequence as a final argument. Since strings are iterable, so you can do this:

[...tx.filter(x => true, "abc")]
"abc"

However, if you provide an empty string, they will crash:

[...tx.filter(x => true, "")]
TypeError: tx.filter(...) is not iterable

This is obviously because the transducer functions do a truthiness test on their final argument to determine whether a source was provided:

export function map<A, B>(fn: Fn<A, B>): Transducer<A, B>;
export function map<A, B>(fn: Fn<A, B>, src: Iterable<A>): IterableIterator<B>;
export function map<A, B>(fn: Fn<A, B>, src?: Iterable<A>): any {
    return src
        ? iterator1(map(fn), src)
        : (rfn: Reducer<any, B>) => {
              const r = rfn[2];
              return compR(rfn, (acc, x: A) => r(acc, fn(x)));
          };
}

This could be corrected by replacing src with isIterable(src) (using @thi.ng/checks). This would incur the additional work of looking up Symbol.iterator, but it would seem to be more correct. I can't think of a case where this would be a breaking change (i.e. values for src that work now but are not iterable by that test).

@postspectacular I'm willing to PR if you think this is actionable. I understand there's a lot of branch work going on, so whatever you think best.

A workaround for the time being is to wrap the string argument in [...s].

earthlyreason commented 4 years ago

I can't think of a case where this would be a breaking change (i.e. values for src that work now but are not iterable by that test).

Technically that assumes ES2015. In ES5 an array is truthy but doesn't have a function at Symbol.iterator.

Indeed, a truly un-shimmed ES5 environment would crash just by calling isIterable, since it assumes the existence of Symbol itself.

I don't know whether people are using @thi.ng in such environments.

postspectacular commented 4 years ago

Hi @gavinpc-mindgrub - thank you for this! Strings (incl. empty ones) are indeed supposed to be fully supported as iterables and so the more proper check seems unavoidable. I'm not worried about ES5 compatibility since this is explicitly stated in the main readme that all packages are distributed as ES6 w/o any polyfills...

Btw. The alternatives I was originally considering to completely avoid the function overloads and checks for a final iterable arg was to provide additional functions like filterIter or rewrite the @thi.ng/iterators to just wrap/delegate to the transducer versions. I think that ship has sailed by now though, but keen to hear your thoughts...

postspectacular commented 4 years ago

@gavinpc-mindgrub do you have bandwidth for a PR? No worries if not, just let me know pls...

earthlyreason commented 4 years ago

@postspectacular, sure I'll send something this evening (I'm EST). Should this branch off of master?

On the design question, I have found the overloads convenient for sketching and "normal" usage. I am mostly working on metaprogramming, though, and in that context I have made my own wrappers anyway (to support lazy, serializable bindings representing a transducer description).

postspectacular commented 4 years ago

@gavinpc-mindgrub great & thank you. There's no rush & please branch off develop...

I'd also say to stick with the overloads to avoid breakage of existing code and yet another major version change.

(Btw. Looking v.forward to see what you've been up to with this all... :) )

earthlyreason commented 4 years ago

@postspectacular, roger that. I currently get the following test failures when running off of develop. Before I dig into these, are they currently expected?

The last one may be a legit timeout. I am working right now on a PC that is stupid slow.

  1) PubSub
       unsubTopic:

      Uncaught AssertionError [ERR_ASSERTION]: Expected values to be loosely deep-equal:

{
  a: [
    'a'
  ],
  b: []
}

should loosely deep-equal

{
  a: [
    'a'
  ],
  b: [
    'b'
  ]
}
      + expected - actual

       {
         "a": [
           "a"
         ]
      -  "b": []
      +  "b": [
      +    "b"
      +  ]
       }

      at Timeout._onTimeout (test\pubsub.ts:103:20)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  2) StreamSync
       never closes:

      Uncaught AssertionError [ERR_ASSERTION]: Expected values to be loosely deep-equal:

[
  {
    a: 1,
    b: 1
  },
  {
    a: 2,
    b: 2
  }
]

should loosely deep-equal

[
  {
    a: 1,
    b: 1
  },
  {
    a: 2,
    b: 2
  },
  {
    a: 3,
    b: 3
  }
]
      + expected - actual

         {
           "a": 2
           "b": 2
         }
      +  {
      +    "a": 3
      +    "b": 3
      +  }
       ]

      at Timeout._onTimeout (test\stream-sync.ts:200:20)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)

  3) Subscription
       unsub does not trigger Subscription.done():
     Error: Timeout of 100ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (D:\home\gavin\lib\my-umbrella\packages\rstream\test\subscription.ts)
      at Timeout._onTimeout (test\subscription.ts:54:13)
      at listOnTimeout (internal/timers.js:531:17)
      at processTimers (internal/timers.js:475:7)
postspectacular commented 4 years ago

I've been getting them sporadically on travis (never locally) and assuming these are just issues with the timing related test setup and for whatever reason slow or super imprecise native timers in node. am 99.999% sure you can ignore these and/or rerun... also see travis build history

earthlyreason commented 4 years ago

@postspectacular I have this more or less done.

Question about flatten, though. flatten already checks for iterable, but it has a special case for string.

export function flatten<T>(src?: Iterable<T | Iterable<T>>): any {
    return flattenWith(
        (x: any) =>
            x != null && x[Symbol.iterator] && typeof x !== "string"
                ? <any>x
                : undefined,
        src!
    );
}

So right now, the following tests pass:

        assert.deepEqual([...flatten(["", "a"])], ["", "a"]);
        assert.deepEqual([...flatten([[], ["a"], ""])], ["a", ""]);
         // (I added these)
        assert.deepEqual([...flatten([["abc"]])], ["abc"]);
        assert.deepEqual([...flatten(["abc"])], ["abc"]);
        assert.deepEqual([...flatten("abc")], ["a", "b", "c"]);
        assert.deepEqual([...flatten([""])], [""]);

flatten("") still crashes, but I'm not sure to handle that without changing existing behavior. The current behavior is somewhat inconsistent in that strings are treated as iterables at the top level but not otherwise:

> [...tx.flatten("abc")]
[ 'a', 'b', 'c' ]
> [...tx.flatten(123)]
Thrown:
TypeError: xs is not iterable
    at iterator (D:\home\gavin\research\the-better-thing\node_modules\@thi.ng\transducers\lib\index
.js:86:19)
    at iterator.next (<anonymous>)
> [...tx.flatten([123])]
[ 123 ]
> [...tx.flatten([[123]])]
[ 123 ]
> [...tx.flatten([[[123]]])]
[ 123 ]
> [...tx.flatten(["abc"])]
[ 'abc' ]
> [...tx.flatten([["abc"]])]
[ 'abc' ]
> [...tx.flatten([[["abc"]]])]
[ 'abc' ]

If this is correct:

> [...tx.flatten("abc")]
[ 'a', 'b', 'c' ]

Then flatten("") should produce an empty sequence. But currently flatten([""]) produces [""].

I don't want to add a special case just for "" if avoidable. What do you recommend?

postspectacular commented 4 years ago

Thank you for the in-depth analysis, @gavinpc-mindgrub! In this case I'd opt (like I guess you do too) for consistency, even if it means breaking the current behavior for top-level strings (incl. empty ones).

So I'd say we treat top-level strings as atoms (unflattenable, just like nested strings) and update the body of flattenWith to something like:

return isIterable(src)
  ? iterator(flattenWith(fn), isString(src) ? <any>[src] : src)
  : (rfn: Reducer<any, T>) => { ... }

With that change we get:

> [...tx.flatten("")]
[ '' ]
> [...tx.flatten([""])]
[ '' ]

> [...tx.flatten("abc")]
[ 'abc' ]
> [...tx.flatten(["abc"])]
[ 'abc' ]
earthlyreason commented 4 years ago

@postspectacular I haven't forgotten about this, just a bit caught up in holiday/family at the moment. Holdup was mostly learning my way around to include proper tests. I did come across a question, though.

I wasn't quite sure exactly how flattenWith was intended to work, so I was going to document it in the process. However, I noticed that there is documentation for a (different) flattenWith in the iterators package, which had been added recently, but not as JSDoc.

Thanks, & best wishes for 2020!

postspectacular commented 4 years ago

Hi @gavinpc-mindgrub - happy new year to you!! 🎉

I thought we cleared up how the behavior of flattenWith was supposed to work in our last 2 comments here and am not quite sure which newer docs in the iterators package you're referring to (git blame shows I edited it last 2 years ago). With the new top-level string behavior discussed above, there will be some discrepancy between both packages, but that's fine since I was going to deprecate the iterators package at some time in the near future anyway.

Also I'd say don't bother too much w/ docs for this PR, they can always be added/edited later on.

Thank you for doing this all! 👍

postspectacular commented 4 years ago

Hi @gavinpc-mindgrub - I was going to prepare a new release in the next few days and was wondering if you have an ETA of your transducer updates and/or if you still have the time to work on this at all... no shame in saying no or needing more time, I just would like to synchronize, if poss. Else we can also just do another one once you're ready... Thanks!

earthlyreason commented 4 years ago

@postspectacular gotcha. I'll shoot for this evening, but if it doesn't come through then we can kick to the next round.

postspectacular commented 4 years ago

@gavinpc-mindgrub no pressure pls! take your time!