do ~15ms/MB (though this is highly data-sensitive)
We should take into account here that preprocessor is not currently optimized, it grows progressively slower with the number of macros defined. I think it should be possible to speed it up, maybe with some syntax restrictions (e.g. all macros I've ever used start with a certain token, which can be put into a map for O(1) lookups).
Slow macro functions can also affect expansion times, but we should not consider this time, since it is expected to either win more speed at run time, or to be necessary to produce the right code and will have to be done anyway.
Pros
The pros of having the preprocessor is the use cases which it enables out of the box.
In my opinion removal of it from runtime will result in it being reinvented by every advanced reducer, and we'll have a zoo of various conflicting preprocessors, only slowing everything down further and making it harder to reason about code. A more flexible tradeoff would be I think to keep it, but make it a module or a flag in the header, and let users enable it for files that they want to be preprocessed.
Use cases I have had so far are:
Compile- or boot-time code configuration:
1.1. Enable/disable fragments of code depending on the environment (hardware, OS, environment variables, etc.): #if, #either, #do [setting: value] to affect #if/#either. E.g. I use it for View backend detection and when I want to disable caching during debugging.
1.2. Enable/disable fragments of code depending on the debugging intent and/or level: #debug, #debug marker, #assert.
1.3. Embedding of external resources into the binary: #embed-image, though this should be covered by a compiler directive instead.
1.4. Embedding of compile-time data: e.g. #do keep [now] to insert compilation timestamp (used in CLI for versioning).
1.5. On-boot code optimization: specialize code once, then run it many times. Ideally a task for JIT compiler.
1.6. Code 'unpacking': e.g. #stepwise reorders long expressions into a sequence for ease of reasoning (very rarely used in practice). Or a big data array can be compressed and embedded as a binary, and unpacked at runtime (e.g. 40kB long map of ML entities used by ML codecs, or locale data).
Other ahead-of-time code rewriting (write for readability and let Red deal with the result):
2.1. String interpolation and localization: #rejoin/#print/#error and our former attempts at it.
2.2. Handy code wrappers that look flat but nest/uglify the code under the hood
2.2.1. To avoid typing brackets and save time: ??? (trace all following code), *** (profile all following code), (* *) (profile code between markers), #where's-my-error? (I used to locate the error before we got the Near: field)
2.2.2. To logically group resource acquisition and release: #leaving [code to eval before leaving the scope]
2.3. Avoiding word leaks from initialization code: #hide
Parse (or other dialects?) extension. It is unfortunate that it wasn't designed for extensibility, but macros can somewhat help here: e.g. I'm using #expect rule macro to automatically generate error code and message when input does not match. It works nice for simple rules, like [integer! | float!] but when rule is complex it will require a rule parser in the macro, and even then it's not gonna be a general solution, alas. But it's very helpful anyway.
2.4. Parse (or other dialects?) extension. It is unfortunate that it wasn't designed for extensibility, but macros can somewhat help here: e.g. I'm using #expect rule macro to automatically generate error code and message when input does not match. It works nice for simple rules, like [integer! | float!] but when rule is complex it will require a rule parser in the macro, and even then it's not gonna be a general solution, alas. But it's very helpful anyway.
2.5. Defining own data formats: e.g. #f0f -> tuple! (eases both reading and writing)
Customized code inclusion procedure: #include can be redefined, which would be hard to achieve with do, since it's so widespread and solves many other tasks. Admittedly I use it because the native one is bugged (which is temporary), but who knows what other uses for it will be discovered.
Putting constants into code (like #define in R/S). Use cases I think are either paranoia (to not let anyone accidentally modify the value), or for other macros to be able to optimize the code by pre-calculating constant expressions.
Run-time preprocessing. I did not use the preprocessor in these cases:
6.1. Object typing: (#type/#on-change) - it has to be done at object template declaration phase, when functions that are possible targets for #on-change are already known.
6.2. Auto-mirroring word changes from one object into another: #push (expands into #on-change) - could have been done in preprocessor, but it was easier for me to put it together with the object typing.
For these use cases it would be nice instead to be able to invoke the preprocessor at runtime with a limited cached set of rules, so I would not have to reinvent it myself. That is: macros should not be triggered by global preprocessing stage, and processing should be fast and free from any other noise.
Since it was being considered to remove preprocessor from Red, this REP aims to document its pros and cons.
Cons
The only con of having the preprocessor I see is it slows down scripts load time:
So roughly on my 32k
test.red
:We should take into account here that preprocessor is not currently optimized, it grows progressively slower with the number of macros defined. I think it should be possible to speed it up, maybe with some syntax restrictions (e.g. all macros I've ever used start with a certain token, which can be put into a map for O(1) lookups).
Slow macro functions can also affect expansion times, but we should not consider this time, since it is expected to either win more speed at run time, or to be necessary to produce the right code and will have to be done anyway.
Pros
The pros of having the preprocessor is the use cases which it enables out of the box.
In my opinion removal of it from runtime will result in it being reinvented by every advanced reducer, and we'll have a zoo of various conflicting preprocessors, only slowing everything down further and making it harder to reason about code. A more flexible tradeoff would be I think to keep it, but make it a module or a flag in the header, and let users enable it for files that they want to be preprocessed.
Use cases I have had so far are:
Compile- or boot-time code configuration: 1.1. Enable/disable fragments of code depending on the environment (hardware, OS, environment variables, etc.):
#if
,#either
,#do [setting: value]
to affect#if
/#either
. E.g. I use it for View backend detection and when I want to disable caching during debugging. 1.2. Enable/disable fragments of code depending on the debugging intent and/or level:#debug
,#debug marker
,#assert
. 1.3. Embedding of external resources into the binary:#embed-image
, though this should be covered by a compiler directive instead. 1.4. Embedding of compile-time data: e.g.#do keep [now]
to insert compilation timestamp (used in CLI for versioning). 1.5. On-boot code optimization: specialize code once, then run it many times. Ideally a task for JIT compiler. 1.6. Code 'unpacking': e.g.#stepwise
reorders long expressions into a sequence for ease of reasoning (very rarely used in practice). Or a big data array can be compressed and embedded as a binary, and unpacked at runtime (e.g. 40kB long map of ML entities used by ML codecs, or locale data).Other ahead-of-time code rewriting (write for readability and let Red deal with the result): 2.1. String interpolation and localization:
#rejoin
/#print
/#error
and our former attempts at it. 2.2. Handy code wrappers that look flat but nest/uglify the code under the hood 2.2.1. To avoid typing brackets and save time:???
(trace all following code),***
(profile all following code),(* *)
(profile code between markers),#where's-my-error?
(I used to locate the error before we got theNear:
field) 2.2.2. To logically group resource acquisition and release:#leaving [code to eval before leaving the scope]
2.3. Avoiding word leaks from initialization code:#hide
Parse (or other dialects?) extension. It is unfortunate that it wasn't designed for extensibility, but macros can somewhat help here: e.g. I'm using#expect rule
macro to automatically generate error code and message when input does not match. It works nice for simple rules, like[integer! | float!]
but when rule is complex it will require a rule parser in the macro, and even then it's not gonna be a general solution, alas. But it's very helpful anyway. 2.4. Parse (or other dialects?) extension. It is unfortunate that it wasn't designed for extensibility, but macros can somewhat help here: e.g. I'm using#expect rule
macro to automatically generate error code and message when input does not match. It works nice for simple rules, like[integer! | float!]
but when rule is complex it will require a rule parser in the macro, and even then it's not gonna be a general solution, alas. But it's very helpful anyway. 2.5. Defining own data formats: e.g. #f0f -> tuple! (eases both reading and writing)Customized code inclusion procedure:
#include
can be redefined, which would be hard to achieve withdo
, since it's so widespread and solves many other tasks. Admittedly I use it because the native one is bugged (which is temporary), but who knows what other uses for it will be discovered.To be able to load any value:
##[code to produce the value]
- load macro is simple, butmold
is complex.Putting constants into code (like
#define
in R/S). Use cases I think are either paranoia (to not let anyone accidentally modify the value), or for other macros to be able to optimize the code by pre-calculating constant expressions.Run-time preprocessing. I did not use the preprocessor in these cases: 6.1. Object typing: (
#type
/#on-change
) - it has to be done at object template declaration phase, when functions that are possible targets for#on-change
are already known. 6.2. Auto-mirroring word changes from one object into another:#push
(expands into#on-change
) - could have been done in preprocessor, but it was easier for me to put it together with the object typing.For these use cases it would be nice instead to be able to invoke the preprocessor at runtime with a limited cached set of rules, so I would not have to reinvent it myself. That is: macros should not be triggered by global preprocessing stage, and processing should be fast and free from any other noise.