Closed mgood7123 closed 3 years ago
Hi there, and thanks for your request.
The C and C++ targets are very flexible regarding input processing. There is a define called UNICC_GETINPUT
which can be defined to any function that emits any kind of character. The EOF can be (dynamically) handled by setting the parser control blocks (pcb) eof
member. Please see the User's Manual, section 5 for further information.
Currently, UniCC only allows for character-based input processing, and the scanner is called by the parser. A parser that is called from the scanner (so called "push parsing") currently is not implemented.
Could this help you or do can you provide a practical example for the case you want to implement?
This is how a push-parser might look like:
foreach( token in tokens )
{
if( push_parse( token ) != PARSER_STATE_NEXT )
break;
}
if( push_parse( EOF ) == PPPAR_STATE_DONE )
printf( "Success!\n" );
Is this what you're looking for?
Like...
// small example of detecting alphabetical characters in binary
//eof shall be string terminator, 00000000 in binary or '\0'
ascii_prefix = 011000 // since we only use 3 alphabetical characters we can use a prefix that leaves only 2 bits available
a = ascii_prefix 01 /* 01100001 */ { puts("received letter a"); }
b = ascii_prefix 10 /* 01100010 */ { puts("received letter b"); }
c = ascii_prefix 11 /* 01100001 */ { puts("received letter c"); }
// ...
$alpha = a b c //...
Thanks for your reply. I still think your problem can be solved with either the UNICC_GETINPUT function, or with a push-parsing solution. A more concrete use case for your problem might help me to better understand how your problem can be solved best.
The primary use would be for disassembly, for example (a very simple example)
010011100101011 DO_FUNCTION 0100100001111001 DO_FUNCTION
And so on
for example
prim : 0100111001010 DEC
DEC : 000 FUNCA | 001 FUNCB | ... | 111 FUNCI
OK, now I'd understand your problem.
This might take some time to implement - both the support of external tokens and a push-parsing approach are necessary to provide this feature. Are you patient with the implementation and interested in testing it when ready?
Yes
how is the implementation coming along?
how is the implementation coming along?
Hi, I'm still working on UniCCv2 but if you need it quite soon I can push it into 1.6. Would it be enough to associate tokens with individual external IDs (integer IDs)? E.g. so that DEC becomes 1, f.e.?
how is the implementation coming along?
Hi, I'm still working on UniCCv2 but if you need it quite soon I can push it into 1.6. Would it be enough to associate tokens with individual external IDs (integer IDs)? E.g. so that DEC becomes 1, f.e.?
i dont know enough about UniCC internals to say if it would be "enough to associate tokens with individual external IDs", but as long as it works i dont really case how it is implemented really, it just needs to be able to parse binary as either strings (eg "10010" or as raw binary (eg 10010)
the only requirement is that it accepts raw binary and either parses it as is or converts it to a string then parses it
though obviously in the case of raw > string it will need to convert on input, eg 10010 > "1", "0", "0", 1", "0", otherwise it may just hang as it tries to read to EOF (even though it has none) then convert the entire input to string which it will never do as it never receives EOF
then again an EOF might just be interpreted as specific binary sequence, such as an HALT instruction, though that rises the possibility to store for example, a 5 GB binary string or greater if HALT exists... far, far, far beyond the execution code, as it would be extremely specific to the code being parsed, normally existing in something intended to halt all binary code execution, such as when powering off the machine or similar depending on the usage case
Will close this now. UniCC will be abandoned.
would it be possible to add a template to parse raw binary input/output, such as when developing disassemblers or assemblers (in which there is no EOF for raw binary unless a specific binary sequence represents the EOF)
note: one solution may be to convert the entire file/input into a binary literal string, as so extremely minimal modification to the C template needs to be made tho i have not tested this