janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.43k stars 221 forks source link

Update `interleave`, `interpose` #1281

Closed primo-ppcg closed 11 months ago

primo-ppcg commented 11 months ago

(def a (range 1000)) (timeit-loop [:timeout 1] "interpose " (interpose 0 a)) (timeit-loop [:timeout 1] "interleave" (interleave "abcd" "1234" "wxyz")) (print ;(interpose " " (generate [i :range [10]] i)))

master:
```janet
interpose  1.000s, 24.43µs/body
interleave 1.000s, 0.8582µs/body
error: expected string, symbol, keyword, array, tuple, table, struct or buffer, got <fiber 0x56056E4D9220>
  in interpose [boot.janet] on line 1723, column 12

branch:

interpose  1.000s, 16.86µs/body
interleave 1.000s, 0.5421µs/body
0 1 2 3 4 5 6 7 8 9

Comments

interleave has become synonymous with mapcat tuple, which is also significantly faster than the previous implementation. The function probably wouldn't be necessary to keep, long term.

When map was reworked, I accidentally unrolled one more level than was previously done. As this template is expanded into 6 different functions, this is unnecessary bloat that rarely results in any benefit.

sogaiu commented 11 months ago

Here are my numbers (+ error output):

master (51c0cf97):

$ JANET_PATH=~/src/spork/jpm_tree/lib ~/src/janet/build/janet ~/scratch/interleave-interpose.janet 
interpose  1.000s, 37.97µs/body
interleave 1.000s, 1.679µs/body
error: expected string, symbol, keyword, array, tuple, table, struct or buffer, got <fiber 0x558212D1AB00>
  in interpose [boot.janet] on line 1723, column 12
  in _thunk [/home/user/scratch/interleave-interpose.janet] (tailcall) on line 6, column 9

interleave-interpose branch (9cf674cd):

$ JANET_PATH=~/src/spork/jpm_tree/lib ~/src/janet.primo-ppcg/build/janet ~/scratch/interleave-interpose.janet 
interpose  1.000s, 33.96µs/body
interleave 1.000s, 1.085µs/body
0 1 2 3 4 5 6 7 8 9

Usual tests didn't turn up anything (not picky enough?) :+1:


Re: interleave - I don't really have mapcat "assimilated", so at least for me, interleave is useful...and may be it's nice conceptually? I imagine that removal might have breakage issues to consider too.

primo-ppcg commented 11 months ago

Re: interleave - I don't really have mapcat "assimilated", so at least for me, interleave is useful...and may be it's nice conceptually?

I agree that it's a useful operation. The column vector (commonly referred to as colvec) is a fundamental operation in linear algebra. I'm questioning only if it needs to be specially defined, given its trivial implementation. rowvec we already have as array/concat, not specially defined.

Perhaps an even more fundamental operation is transpose, which in Janet can be implemented equally trivially as map tuple.

(defn transpose
  [mat]
  (map tuple ;mat))

(def mat @[[1 2 3] [4 5 6] [7 8 9]])
(pp (transpose mat))
# ->
@[(1 4 7) (2 5 8) (3 6 9)]

Because the implementation is so trivial, I don't believe it's necessary to add, and I think the same is true for interleave.

sogaiu commented 11 months ago

I don't find the "implementation is trivial" angle compelling.

IIUC, part of what goes into that here is that it happens that the trivial implementation is efficient. This is not guaranteed over time. As a language changes it's possible a different not-so-trivial implementation may be desirable -- in which case one might want that instead.

For the case of interleave, I happened to have had uses for it -- transpose not so much. This may depend on the sorts of code one writes and the areas one tends to write for.

Perhaps in the realm of:

Janet makes a good system scripting language,

interleave is much more likely to come up than transpose? Of course it's also possible that opportunities are just not being recognized...I've had that sort of thing happen to me repeatedly over my programming life :)

What do you think?

primo-ppcg commented 11 months ago

IIUC, part of what goes into that here is that it happens that the trivial implementation is efficient. This is not guaranteed over time.

I think that's a fair argument. Many of my recent PRs have made the implementations slightly more complex, and significantly faster. The great thing about having core library functions that are approximately as efficient as possible is that you can just use them, without having to think too much about it. mapcat tuple, while trivial, is perhaps not intuitive. By the same argument, perhaps transpose should also be added? :sweat_smile:

sogaiu commented 11 months ago

By the same argument, perhaps transpose should also be added?

May be so.

Editing your example from above:

(defn transpose
  [mat]
  (map tuple ;mat))

(transpose @[[1 2 3] [4 5 6] [7 8 9]])
# =>
@[(1 4 7) (2 5 8) (3 6 9)]

I see this example and I try to think when I have wanted this sort of functionality (outside of where there were higher level libraries that made use of existing transposition functions) and my imagination / memory is not helping (^^; [1]

May be others have wanted it though (or as I mentioned before, I'm just missing some good opportunities that I'm not seeing). Possibly folks who use jaylib or TIC-80 might be interested since they both can be used for graphical games.

Perhaps a discussion / PR / whatever to query for interest is in order?


[1] LOL...but grepping did help...

primo-ppcg commented 11 months ago

(using map array so that the output is identical)

(use spork/test)

(defn transpose1
  [a-grid]
  (def new-grid @[])
  (loop [j :range [0 (length (get a-grid 0))]
         :before (put new-grid j @[])
         i :range [0 (length a-grid)]]
    (put-in new-grid [j i]
            (get-in a-grid [i j])))
  new-grid)

(defn transpose2
  [a-grid]
  (map array ;a-grid))

(def grid @[[1 2 3]
            [4 5 6]
            [7 8 9]])

(timeit-loop [:timeout 1] "transpose1" (transpose1 grid))
(timeit-loop [:timeout 1] "transpose2" (transpose2 grid))
transpose1 1.000s, 2.865µs/body
transpose2 1.000s, 0.4264µs/body

map has always been variadic, so clearly not intuitive to the author of this function.

sogaiu commented 11 months ago

Nice!


As the author I can tell you that the priority was on translating code from another language and getting the whole demo working (^^;

primo-ppcg commented 11 months ago

translating code from another language

Janet is more expressive than most :wink:

I'll leave it to your discretion if you think a PR is warranted. I personally think it is more fundamental than interleave.

sogaiu commented 11 months ago

Janet is more expressive than most

Surely.

FWIW, for the demo, I think there was also a motive to leave the code similar to the original while getting it to function, but also to make comparison easier as that might help in being an entry point into assimilating more potential users. Motives, motives...


Regarding interleave, my guess is that it is (or perhaps was) most obviously used for n = 2 collections in the context of typical scripting. When I search for uses I come across it being used like:


From the above conversation I would say that there's at least a clear use for transpose in the context of graphics / games, which Janet seems to be in use for. It seems like there is at least one person (@nstgc) who tried to apply Janet to do some DL-ish matrix stuff (I assume that's what he meant regarding "not converging") so I imagine that could be another case. I gather you'd have uses, which might be different from the previous ones, so...perhaps it's a bit more spelled out now :)

primo-ppcg commented 11 months ago

Given n = 2:

  • (table ;(interleave ...

This is precisely zipcoll.

  • (partition 2 (interleave ...

And this is precisely map tuple.

The most notable use for matrix column vectorization that comes to mind is in the computation of kronecker products (see also), which I suppose is not exactly related to scripting.

sogaiu commented 11 months ago

This is precisely zipcoll.

Yes I realized when I wrote about the use of (table ;(interleave ... that there was a short way, but I had forgotten precisely what it was -- the "or perhaps was" above was partly motivated by that :)

I didn't have the map tuple thing in mind (though I think you and others have mentioned it elsewhere). Thanks for bringing that up.

I think zipcoll is a better choice compared to (table ;(interleave .... Perhaps not being aware of its existence ("500+ functions and macros in the core library" can have a downside) resulted in the use of (table ;(interleave ... in some cases.

(partition 2 (interleave ... is longer (in multiple ways) and has a 2 (more moving parts) so may be there's some motivation to try to see things from the map tuple angle. As you hinted at above, that map is variadic might not be so readily available in some folks' minds (I would be among those people). Perhaps if I focused more on the ultimate products rather than a particular process of getting there, it could help.

The most notable use for matrix column vectorization that comes to mind is in the computation kronecker products

This one I was completely unaware of -- at least I don't remember it from my linear algebra days [1] (^^;

which I suppose is not exactly related to scripting.

There is also:

or a language to embed in other programs. Janet also can be used for rapid prototyping, dynamic systems, and other domains where dynamic languages shine

though it's not clear to me if that helps in this case.


[1] The only Kronecker thing I find vaguely familiar is this.

sogaiu commented 11 months ago

This conversation has got me thinking about how one might move toward removal of some functions / macros (callables).

IIUC, there is a deprecation mechanism, but I also wonder whether there couldn't be a "compat" library (spork or part of it?) one could be putting the "to-be-removed" things into.

No idea of what kind of work would be involved...possibly not worth it.

primo-ppcg commented 11 months ago

The reason I mention it at all, is because of this comment about removing rarely used and/or obscure functions (personally, I would object to the removal of juxt, although I'd be fine seeing juxt* get axed).

sogaiu commented 11 months ago

I thought it might have been related to that :)

May be it makes sense for there to be a discussion item with candidates?

It's a thorny problem...that of having users that one may never have communication with. I wonder if anyone has tried the idea of "registering" one's projects and having them automatically analyzed for usages. Such data might be used as a way to assess which things are being used and possibly how much...

(What I've been doing is scanning what I've collected but this doesn't reflect the intent of authors who want things that can be maintainable over long periods nor is it particularly easy to search for certain patterns.)

nstgc commented 11 months ago

From the above conversation I would say that there's at least a clear use for transpose in the context of graphics / games, which Janet seems to be in use for. It seems like there is at least one person (@nstgc) who tried to apply Janet to do some DL-ish matrix stuff (I assume that's what he meant regarding "not converging") so I imagine that could be another case. I gather you'd have uses, which might be different from the previous ones, so...perhaps it's a bit more spelled out now :)

For what it's worth, as much as I find Janet easy to work with, if I were to ever actually try this as more than a pet project, I'd use Clojure and Neanderthal. For this particular project I made a point of not using external libraries, so all the matrix stuff was rolled myself in the language. (And by "myself", I mean "I copied it off the Haskell Wiki" for Haskell, used Julia's in-built matrix capabilities, and wrote some pretty sub-optimal, albeit pretty, functions for Clojure and Janet.) I couldn't possibly wrap Intel's MKL, nor any other similar lib. Even then, my small project, which equated to little more than fancy regression, converged several times slower on Janet than on Julia or Clojure; if I were to try for something larger, the time difference would be measured not in minutes, but in hours or days. (Interestingly, the Clojure and Julia versions converged at about the same rate despite Julia's optimization. The Haskell version... let us not speak of that version!)

My point is that, much as I wish it were otherwise, Janet is ill-suited for number crunching tasks. More suited than Python, perhaps, but still much slower than other languages. Perhaps I don't understand the language enough, or computer science in general, but I don't think even wrapping the MKL would bring it up to par of Clojure.

And that is fine, it wasn't a design goal, I just want to point out that trying to strap a jet engine to a Prius is probably not going to make people as happy and they might hope. I hope that doesn't insult anyone, I'm sure most of you know that better than I do (Unless I'm absolutely wrong about that in which case WOW am I going to feel embarrassed!), but as I was brought up as an example of using Janet for such a purpose, I feel compelled to make a case against it.

sogaiu commented 11 months ago

@nstgc Thanks for chiming in and sharing your thoughts.

Yes, I think there used to be some text about Janet not being oriented to numerically heavy / intensive tasks (or something like that - but I'm not finding it ATM), so your summary sounds pretty consistent with that.

Proof-of-concepts for a variety of things seems achievable for some things though -- but perhaps not performance-wise in pure-non-FFI-Janet.

I think in some cases the "rewrite the slow parts in C" idea could apply as well to obtained Janet proof-of-concept programs.

nstgc commented 11 months ago

Yes, I think there used to be some text about Janet not being oriented to numerically heavy / intensive tasks (or something like that - but I'm not finding it ATM), so your summary sounds pretty consistent with that.

That's interesting, and good to hear!

I think in some cases the "rewrite the slow parts in C" idea could apply as well to obtained Janet proof-of-concept programs.

And something like a library wrapping MKL would certainly do that, and also possibly fix whatever numerical issues I was seeing. Thinking of which, I probably should try cooking up a minimal working example. From my past experience I should be able to demonstrate that converging to a numerical solution takes more steps as compared to other languages when using the same algorithm and that it's more sensitive to converging to the wrong answer or not at all. I might be mistaken, but I would expect the same algorithm using the same floating point arithmetic should result in the same results. Something for the weekend. Or a weekend. Unless someone here thinks there isn't reason to do that.

sogaiu commented 11 months ago

I did some more searching about the "numerically heavy / intensive tasks" thing.

The closest thing I've found so far is this quote from bakpakin:

When possible, I like to use Janet as a spring board for any programming needs of mine where CPU performance is not an issue.

via: https://github.com/janet-lang/janet/discussions/554#discussioncomment-267822

It's possible this is what I was thinking of (^^;


Regarding performance / benchmarks, I think it has been mentioned before but @MikeBeller put this together sometime back. It might be interesting to see how things are these days, particularly with primo-ppcg's recent work. May be primo-ppcg is waiting until he attends to the things he mentioned here :)

Perhaps also the array/blit thing could be an interesting thing to explore.

primo-ppcg commented 11 months ago

I should be able to demonstrate that converging to a numerical solution takes more steps as compared to other languages

Identical implementations, with identical data types should display identical convergence behavior. If it doesn't, it's almost certainly a bug, which would be good to know about.