Open hiiamboris opened 3 years ago
First and most importantly, I appreciate all the thought that went into this, the explanations, and the working mezzanine code. A++
The tube and wire metaphor is very helpful when thinking about images, but confusing to me when thinking about tables of records.
Attempt to fuse both 2D and 1D model into single interface leads to ambiguities.
Agreed. But I'm not sure pseudo-2D is the right answer either. Images are a special case, and force-fit into what we have now, inelegantly in some ways.
>> t2
== [
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
]
>> skip' t2 2
== [
3 4 5 6
7 8 9 10
11 12 13 14
15 16
]
I expected:
>> skip' t2 2
== [
9 10 11 12
13 14 15 16
]
>> foreach' x t2 [print x] ;-- by row iteration
1
5
9
13
== unset
I expected:
>> foreach' x t2 [print x] ;-- by row iteration
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
I imagine you've also considered how path access will work, so a tube works like a block with sub-blocks. And that model is the one you have to dethrone in my mind. For all the things I like about this idea, it leaves me feeling uneasy. It's like an uncanny valley of data. 9^/
I like the idea of column extraction, enforcement of record size consistency, the efficiencies, and data carrying meta info. But I also appreciate the simplicity of nested blocks and key-value structures. My gut feeling is that this really cool idea may make things harder for users rather than easier. Some of that is surely due to it being new to me, and needing more time to think about and then play with it. e.g. skip'
necessarily deals in columns on the linear side, but if I think about fixed size rows, i.e. records, that's more like a DB, and they skip in terms of records.
But a big part of it comes from having seen Redbol data in flat blocks, which had invisible structure to it, and that's not a good thing. So the idea of pseudo-2D, and needing to reason about behavior that may be subtle and easy to take for granted but goes wrong because of assumptions on incoming data for generic funcs.
I'm not saying it's a bad thing to want, you know I want a table
myself, but that trying to do it all with flat series is, itself, a low level approach (windowing over data) that has its own impedance mismatches and, perhaps, puts a bit too much strain on the leverage of the current series/action model for my taste. You've shown we can do this at the mezz level, so I still believe we can build a great table
system and maybe use some of these ideas.
Please make sure the community knows about this, so others can comment.
Thanks for your thoughts ;)
To clarify, I totally agree with all you're saying. This model is not omnipotent. It is just a replacement for the ugly /skip we have. Nothing more, not a spreadsheet for sure. Still, blocks of records is an efficient structure and I can't think of removing it from Red, that's why I'm trying to make it better for the programmer (; The typical task this model tries to solve is e.g. reactions table in reactivity.red
. Block of blocks has its merits but you can't find/select on it and that's it. And think of vectors. Vectors can't be nested.
>> skip' t2 2
== [
3 4 5 6
7 8 9 10
11 12 13 14
15 16
]
I expected:
>> skip' t2 2
== [
9 10 11 12
13 14 15 16
]
My first idea was also that, skip rows. But then we lose the level of control we have right now. What if I want to prefix my 4x4 table with 3 configuration values? Looking closer you'll see I'm not really changing how most series functions operate. I just make them apply /skip implicitly.
To achieve what you expect you can skip 2 rows: skip' t2 2x0
.
I will also post part
design later, which beautifully complements this /skip design, especially for vectors. I like how they fit together well as the mockup code shows.
By the way, I'm not saying that we should remove /skip refinements. In fact, most of the code won't change. It will just apply /skip from the cell's header, then from the refinement (if provided). This way we can get both a human friendly interface and backward compatibility with Rebol.
Great. I'll be anxious to hear Nenad's thoughts as well. As you know I tend to think of higher level abstractions when I want a different model, and this provides a lot to think about.
Just a quick note: we have a 2D image type already, why not create a tensor! type for numeric or other fixed values? It might already be in OpenCV - I have it on my todo list.
Here's a magnificent display of why I insist /skip must be embedded (real code from reactivity):
unlink-reaction: function [reactor [object!] reaction [function! block!]] [
at-reactor: find/same/skip reactors reactor REACTORS-PERIOD
pos: next relations: at-reactor/2
while [pos: find/same/only/skip pos :reaction RELATIONS-PERIOD][
--debug-print-- ["-- react: removed --" :reaction "FOR" pos/-1 "IN" reactor]
remove-part back found?: pos RELATIONS-PERIOD
]
if empty? relations [remove-part at-reactor REACTORS-PERIOD]
at-reaction: find/same/only/skip reactions :reaction REACTIONS-PERIOD
remove find/same objs: at-reaction/2 reactor
if empty? objs [remove-part at-reaction REACTIONS-PERIOD]
found?
]
Now isn't this a mess? ☻
Just read through this again, and vectors are certainly a key value area here. I saw an interesting video some years back about security analysis of executables, looking at the binaries as data and windowing using different patterns, creating a pointcloud like visualization that made it instantly clear when things lined up into chunks and such.
Background
Use of /skip in series functions leads to micromanagement and clutter undeserving of a high-level language. Imagine if Red values didn't carry their type inside and we always had to type:
x: [integer! 0] if [integer! x] == [integer! y] [...]
and so on. Yes,/skip
looks exactly that stupid. And when one suddenly decides to extend one's 'table' with new columns, it becomes a nightmare.Proposed design explores the idea of embedding /skip into the series itself. Two possibilities:
We include /skip into the cell, together with the index (provided we find space). Then we become able to shape the same array of data at any time depending on how we wanna access it, e.g.
shape data 3
would change skip to 3 same asskip data 3
changes the index.shape data new-width
in this design simply creates a new cell pointing to the same data, similar toas
native.We include /skip into the series buffer, together with it's tail. And it makes sense, since /skip is really a property of the data itself. Drawback: when we want to ignore /skip and work on per-item basis, we have to temprorarily change /skip value in place, and restore it afterwards.
shape data new-width [..code..]
would be a wrapper mezz. But if e.g.code
will call some other code that relies ondata
havingold-width
, this gets messy.Proposed design follows the (1) possibility as it is more flexible.
Implementation note. This design benefits 3 types:
block!
,hash!
andvector!
. And onlyblock!
has a free 32-bit value in the cell. On the other hand, we have 3 reserved bits in the header, and 11 more bits can be freed, totaling at 14 bits =0..16383
range. Not enough for a spreadsheet emulation using a mere block, and may not be enough for some big matrix computations, but for tasks we usually tend to solve with /skip - even0..32
range is usually more than needed.P.S. Actually, looking at
red-routine!
structure I believe more than 11 bits can be freedOverview of
image!
modelIn Red we already have a 2D series type:
image!
. Most of the linear thinking does not apply to it:index
(ROI starting point) of the image and it'spart
(ROI ending point) are 2D terms, so when we copy a part of it, we copy content between chosen columns (i.e. copy is sparse), and it's two-dimensionally clipped by the image/size:skip i 3x0
wraps column=4 around and resets it to column=1:Of course, image is 2D by nature and it wouldn't make sense to treat it as a linear succession of pixels, and what we see above is just the best we could do with actions designed for linearity. But for other series this model doesn't work, as by supporting planar model we are losing support of the linear one.
Attempt to fuse both 2D and 1D model into single interface leads to ambiguities.
E.g. consider
copy/part series part
function, where bothseries/size
andpart
can be 1D (integer) or 2D (pair):And what is the meaning of
skip series amount
with e.g.series/size = 5x5
andamount = 6
?6
items arriving at point2x2
?6x1
?As a result we can't apply proper 2D thinking here and should instead think of pseudo-2D (;
Wire model
For the purpose of this design I'm going to call it's data type a tube, and the model - wire model, for it resembles a wire (or spaghetti) rolled like a spiral inside a tube of predefined width. This is the model our /skip refinement currently follows:
E.g. for a tube of width=3 we have a representation:
Tube width can be:
In this model
copy
is always continuous, and addressing is linear: whereas pairAxB
in image has a meaningpixel at (X=A,Y=B)
, in wire model it's totally different:skip A whole rows then skip B single items
, that islinear-skip = (A * tube-width) + B
unambiguously maps between pair and integer representations. Note thatA
then denotes what would beY
in the image model.For illustration let's define 2 tubes, one bound one unbound:
First important aspect is
mold
: now it outputs data according to it's format:This example is simplified. For
mold
,load
,save
we'll have to come up with a construction syntax (which for vectors is already convoluted enough).Then addressing. Since width is known, we can omit it and tell the code to "skip N rows":
There are 2 possible
index?
andlength?
functions:I haven't decided yet on the best way to add this pair index and length (as a refinement or a transform function), but it looks useful.
We can change tube width at will (but will have to provide a separate native/mezz for that):
extract
is one of the interesting parts. We want to be able extract both columns and rows. Column extraction we have already:I propose we add a
/width
refinement toextract
to generalize it:Reordering (
sort
,reverse
,random
) will work on a row-by-row basis forwidth <> 0
:Set operations (
union
,exclude
,unique
,difference
,intersect
) support /skip but are currently comparing against 1st column only (need/compare
and/all
refinements fromsort
):foreach
is also one of the most interesting aspects. If our data has rows, why will we ever want iteration not aligned to rows? Most useful case seems whenforeach
skips a whole row:On the other hand, listing a proper number of words in the spec (
foreach [x y _ _] t2 [...]
) is not a problem either, so I'm undecided.find
&select
now start looking for rows:The rest should be trivial.
Wanna play?
Visit https://gitlab.com/-/snippets/2066140