metaeducation / rebol-issues

6 stars 1 forks source link

The SAVE (and MOLD) function(s) considered harmful #1632

Open rebolbot opened 14 years ago

rebolbot commented 14 years ago

Submitted by: Ladislav

In a software project I encountered a problem that the SAVE function unexpectedly modified the "original script" in such a way that it ceased to work as it was written. The examples below illustrate the problem using MOLD.

Since all code versions (the original code, the MOLD result and the MOLD/ALL result) are recognized by the LOAD function as valid REBOL syntax, I cannot help but claim that MOLD and SAVE functions "distort" the original source code, causing it to work in an opposite way than it was originally written. This does not look as a problem when just the MOLD function is assessed, but it surely is a gotcha when the SAVE function is used.

I do not want to name the "victims" of this gotcha, but I hope it suffices to say that they are (in my opinion) very experienced REBOl users.

; example #1
code: {block: [true #[true]] equal? type? first block type? second block}
do mold load code ; == true ; this is a gotcha
do mold/all load code ; == false

; example #2
code: {block: [0.10000000000000001 0.10000000000000002] same? first block second block}
do mold load code ; true ; this is a gotcha
do mold/all load code ; false

CC - Data [ Version: alpha 99 Type: Issue Platform: All Category: Documentation Reproduce: Always Fixed-in:none ]

rebolbot commented 14 years ago

Submitted by: BrianH

"that they are (in my opinion) *very experienced* REBOL users"

They would be, because inexperienced REBOL users don't tend to use the "serialized" syntax in their regular code. I've also been caught by this gotcha recently in R2/Forward's typeset code, when combined with R2's build process.

Your summary is a little off: It is not the MOLD and SAVE functions that are potentially harmful, it is the practice of writing out the "serialized" (MOLD/all) syntax in "regular" (MOLD) code. The ability to do that makes it easy to write code that can't easily be expressed the same way without the syntax, and power users sometimes take advantage of that fact. However, that syntax only really works when you are writing original code, and not always then. It almost always causes binding issues if you write functions or objects that way. If you have a code building process that requires regenerating source strings, you need to remember to generate "serialized" syntax with MOLD/all or SAVE/all. And if your code includes module, function or object values, you have to remember to not use "serialized" syntax because you can't necessarily make working values of those types in that syntax.

The fact is that REBOL has two serialized syntaxes, MOLD and MOLD/all, which are deserialized by DO and LOAD, respectively, and both of these syntaxes have limitations. You can only mix them under certain circumstances, and it can be tricky to limit yourself to those circumstances. If you don't, it will trip you up. These limitations are good to note, but to be fair this practice is only done by power users anyways.

rebolbot commented 14 years ago

Submitted by: Ladislav

I have to explain more thoroughly how it happened: A very experienced REBOL user wrote a "script internationalization preprocessor" (translating REBOL scripts to different language(s)). He used SAVE as the last call to save the translated script back to a file. Notice that he did not use any "serialized" syntax at all, he just used a SAVE call.

The problem is that a preprocessor using SAVE (i.e. not SAVE/ALL) modified a specific script in an unacceptable way, causing it to work differently. That is why I consider it "harmful" to use SAVE instead of the proper SAVE/ALL when preprocessing REBOL scripts.

rebolbot commented 14 years ago

Submitted by: Ladislav

"...it is the practice of writing out the "serialized" (MOLD/all) syntax in "regular" (MOLD) code." - this is where I simply have to disagree: there is no (MOLD) code in my above description, I mentioned just a REBOL script.

Moreover, I did not mention any objects or functions, I mentioned just decimal numbers and logic values. BTW, the "serialized" syntax of decimals is just a "precise syntax" of decimals, so it is really harmful to use any "distorting function" such as MOLD or SAVE to preprocess REBOL scripts.

rebolbot commented 14 years ago

Submitted by: Ladislav

My summary is:

1) "REBOL syntax" is the syntax recognizable by the LOAD function
2) SAVE and MOLD "destroy" scripts using REBOL syntax
3) SAVE and MOLD are "harmful" when used to (pre)process REBOL scripts
rebolbot commented 14 years ago

Submitted by: maxim

IMHO LOAD isn't the counterpart to MOLD, DO is. Even then, MOLD/ALL isn't a real serialization because shared objects, or series aren't shared anymore and binding can't be re-established without the environment it will be used in anyways.

Although they go a long way, and I have been using them extensively in many tools, MOLD & MOLD/ALL are, destined for advanced users who are able to properly undertand the deeper notions of REBOL.

rebolbot commented 14 years ago

Submitted by: maxim

The only thing I can see would go one step forward would be to add reference & binding indicators to the MOLD/ALL, probably as an additional refinement, since this would add more data to the serialization which isn't required in all (most) cases.

ex:
>> MOLD/ALL/REFERENCES reduce [a: "A shared string" a]
== {[  #[string! "A shared string" #0001]  #[string! #0001] ]}
rebolbot commented 14 years ago

Submitted by: maxim

also, we could add a little mezz which simply wraps the MOLD/ALL/REFERENCES into something more readable:
serialize: func [data][MOLD/ALL/REFERENCES data]
rebolbot commented 14 years ago

Submitted by: maxim

also, there are nasty binding issues on serialization...  things like:
---------------------------------
a: "data"
f: does [print a]
o: context [
    a: "other data"
    f: does [print a]
]
set in o 'f :f
---------------------------------
how is this treated in MOLDing ?  these are hard cases where we can only choose one from a variety of solutions.
if we serialized only the 'O object!, current REBOL versions will break the above binding on MOLD.  More advanced versions might still break other advanced patterns.

The serialization will never be "perfect", unless all data items are signed with a unique key which is retrievable at run-time and reusable at each interpretation (pretty much impossible).

IMHO all we can do is properly document what can and cannot be serialized without risk of data corruption, however powerfull it is or will be in the future.

AFAIK, currently, official & explicit documentation on this is vague at best.

rebolbot commented 14 years ago

Submitted by: Ladislav

That "DO is a counterpart of MOLD" mantra is starting to look as Goebbels' truths. Example:

    >> do mold quote (1 + 2); == 3
how is DO a "counterpart" of MOLD in the above example is beyond my understanding
rebolbot commented 14 years ago

Submitted by: BrianH

Ladislav, you used the QUOTE function there to make an active value (as far as DO is concerned) into an inactive value. And then you didn't include a QUOTE in your output.

This is similar to something like this:

>> do mold next "abc"
== "bc"  ; not the same thing
>> mold next "abc"
== {"bc"}  ; this is why
>> mold/all next "abc"
== {#[string! "abc" 2]}  ; in "serialized syntax" there's no data loss

If you are doing post-processing on a value to get it to be treated differently than DO does normally, or constructing values that don't have literal representations without using #[...] "serialized syntax", then MOLD won't help you on its own all of the time. MOLD only helps if the construction is done with MAKE (and not even then for modules). In other cases you really need to include the post-processing steps yourself in the resulting code. Or use MOLD/all, carefully.

Neither MOLD nor MOLD/all can represent everything about all values in REBOL. According to their design criteria, MOLD is supposed to be more "friendly", and MOLD/all more "exact", but neither one can handle some situations. This doesn't make them harmful, it makes them of limited use, but still of use.

Whether it is more appropriate to use MOLD (SAVE) or MOLD/all (SAVE/all) when you are preprocessing scripts depends on how you are preprocessing the scripts. If you are just LOADing them and manipulating them from the outside, MOLD/all will do nicely. If you are constructing the values in memory then you may have to use MOLD to output them, and be careful about the semantics of the source you are outputting so that the construction steps are included if necessary. Prerebol does the former, the mezzanine building process does the latter - it's a trade-off.

rebolbot commented 14 years ago

Submitted by: Ladislav

"...you didn't include a QUOTE in your output." Where can I include a QUOTE in my output to obtain anything sensible? To not mess with quote at all, I can just process data in a block one by one using MOLD, with exactly the same result:
data: [(1 + 1) 'none]
m-data: copy []
foreach value data [append m-data mold :value]
d-data: copy []
foreach value m-data [append d-data do :value]
d-data ; == [2 none]

This example demonstrates that MOLD and DO actually are no "counterparts". The attempt with next "abc" has no resemblance to this.