Open GitMensch opened 2 years ago
Things that this will be useful for:
Did not found the option to test that, but the parser looks good. One thing that I've recognized is that EXEC SQL VAR statements are not fully supported, its syntax according to the docs is.
EXEC SQL
VAR <host_variable>
IS <ext_type_name> [({<length> | <precision>,<scale>})]
END-EXEC.
To fully support that (if in question then possibly with a warning "pending feature") the optional items need to be added - you already were so clever to do it in the lexer for the "length", the piece missing (not at all important for me) is the "precision, scale" format.
Also missing are the external types: FLOAT
(COMP-1 and COMP-2, also not important for my use case), CHARF
(likely the same as current CHAR
, RAW
(likely the same as VARBINARY
) and VARCHAR2
.
Note: for Oracle VARCHAR
means the VARYING
structure (with implicit -LEN
and -ARR
sub-fields) while VARCHAR2
means "a single field, automatically rtrimmed when passed to DB and space-padded when read). What does VARCHAR
mean currently with GixSQL?
To fully support that (if in question then possibly with a warning "pending feature") the optional items need to be added
Yes, I intend to do that
What does VARCHAR mean currently with GixSQL?
Currently it's the first one (implicit -LEN and -ARR sub-fields)
To be more clear: the length parameter as it is written already supports passing precision and scale : they are encoded as 16 bits integers shifted into a 64 bit int which is passed between the various definitions in the parser. I just need to implement the types and test.
Currently it's the first one (implicit -LEN and -ARR sub-fields)
OK, that matches the Oracle definition. Please add VARCHAR2, too :-)
And: Big thanks for your work on GixSQL - I hope the time invested in testing, issue and PR creation is at least a slight motivation to go on.
Of course it is.
I made some modifications in the parser: the cursor handling part was not really a work of art, but it was one of the first things I put together when I started fiddling with my modifications to ocesql 3 or 4 years ago (they were just that at the time) and I didn't know the then-current codebase, so I didn't want to change too much. I have reorganized that part and a lot more stuff in the parser/scanner, so it is now cleaner and more maintainable. This will obviously need a lot of testing to avoid regressions, and this is why I still cannot give a timeline for the next release, but we are getting there.
Reopening to keep track of the modifications on EXEC SQL VAR and the implementation of VARCHAR2
By the way: maybe I am missing something but Visual COBOL (at least v6.0) seems to ignore the EXEC SQL VAR declarations. They are parsed but no actual check seems to take place. In the following source snippet:
WORKING-STORAGE SECTION.
01 NUM1 SQL TYPE IS VARBINARY(100).
01 NUM2 SQL TYPE IS DATE.
01 NUM3 PIC S9(8)V9(4).
EXEC SQL VAR NUM3 IS xyz123 END-EXEC.
EXEC SQL VAR NUM3 IS idontknow END-EXEC.
EXEC SQL VAR NUM4 IS something END-EXEC.
the compiler doesn't catch three quite obvious errors:
As I said I might very well be missing something.
EXEC SQL
is commonly handled by preprocessors, not by the COBOL compiler.
Many preprocessors that don't know how to handle something (which may be EXEC SQL VAR
in your case) see it as a comment.
With VC (if this is what handles the statements) you may want to increase the default output, to check for informational messages; if I remember correctly by $SET VERBOSE WARNINGS (3)
[or similar].
Mostly done, except for VARCHAR2
, still working on it (needs changes in the runtime libraries), but I already added support for FLOAT
, INT
and DECIMAL
.
Started to check with some ProCOB targetted code, result:
error: Invalid level for SQL variable, it is 05, should be 01
This is a check that is not happening with ProCOB. Maybe it can be lowered to a warning instead?
If the 01/77 is needed for the CALL
(per COBOL standard rules) one could just assume "that possibly works, if not we had a warning", or the generation could be changed so that using a non level-01 (note: level 77 should definitely be possible too!) items automatically generates a MOVE lvl-05-item TO tmp-01-level-05-item
and vice versa before/after the CALL
and the temporary item be automatically created, too.
OK, GixSQL has no issue with the bind parameters being level 05 either, so I think that error message should only - if at all - happen if it defines an implicit variable length structure - but even then this seems questionable.
Note: this is not a show-stopper for me currently, can also be handled "correctly" later if that would mean much work.
error: Invalid level for SQL variable, it is 05, should be 01
This is a check that is not happening with ProCOB. Maybe it can be lowered to a warning instead?
This is currently more a limitation in the runtime library, that expects host variables to be a top-level group with level 01. I have to check in there to assess the impact. For context: the "level 01" assumption/limitation was ok for the code I was porting and allowed me to develop faster, but obviously it must be removed now .
This is currently more a limitation in the runtime library, that expects host variables to be a top-level group with level 01
That doesn't seem like the case, because the following (peudocode) works fine:
01 some-rec.
05 some-value1 pic xxxx.
05 some-value2 pic 99.
EXEC SQL
SELECT A, B FROM X WHERE val1 = :some-value1 val2 = :some-value2
END-EXEC
... or I did not understand what you've said, also possible.
... or I did not understand what you've said, also possible.
More likely: I don't remember it well, but I will look into it
@mridoni Any update for VARCHAR2
(doing explicit rtrim COBOL->DB and space-pad DB->COBOL)?
Leaving a ping on VARCHAR2
as this is the currently only known "missing piece" for full postgres support of some existing applications and being more than a minor PR (needs adjustments in both the preprocessor and the runtime [I think that won't be driver related]).
Leaving a ping on
VARCHAR2
as this is the currently only known "missing piece" for full postgres support of some existing applications and being more than a minor PR (needs adjustments in both the preprocessor and the runtime [I think that won't be driver related]).
That's the next item on my list for GixSQL. Just one thing: in the Oracle documentation I couldn't actually find a difference between VARCHAR
and VARCHAR2
; except for NULL handling (in Oracle empty string = NULL) they behave the same (including right-side trimming)
And from what I see PostgreSQL does not directly support VARCHAR2
, this seems to be a feature in the (commercial) EnterpriseDB only: https://www.enterprisedb.com/docs/epas/latest/epas_compat_reference/02_the_sql_language/02_data_types/ where, by the way, VARCHAR2
seems to be just an alias for CHARACTER VARYING(n)
. Am I missing something?
I think that CHARACTER VARYING
may also be implemented as auto-right trimmed in a preprocessor for PG.
For Oracle the difference in the preprocessor is, that VARCHAR2 is right-trimmed (see link in the opening post):
If [... the] columns are VARCHAR2, the program interface strips the trailing blanks on input and inserts just the 6-character string "MILLER" and the 5-character string "SALES" into the database. However, if the target database columns are CHAR, the strings are blank-padded to the width of the columns.
(and on reading in VARCHAR2 are blank padded again to be of the expected COBOL size and type).
So this would basically mean:
VARCHAR2
as a synonym for VARCHAR
(keeping the null-string stuff behaviour, since that is non-standard)VARCHAR2
and VARCHAR
auto-right-trimming in the COBOL -> DB direction and auto-right-padded in the opposite oneIs that correct?
From my understanding that's correct for VARCHAR2, but defining a variable as VARCHAR in ProCOB leads to be identical to VARYING
-> creating a group with ARR
and LEN
.
Ok, I'll do a test with ProCOB (and report the results here): I have just re-read the documentation and, while it is very detailed on the behaviour of VARCHAR2
, it is not very clear with regard to the differences between VARCHAR
and VARCHAR2
. From the way the documentation is formatted (see below, taken straight from page 4-7 of ProCOBOL for Oracle 11g Release 2 (11.2) E10826-01) what I gather is that the two paragraphs titled "Input" and "Output" apply to both VARCHAR
and VARCHAR2
, but the only way to be sure is to test it and see how (if) the generated code differs.
VARCHAR The VARCHAR datatype represents variable-length character strings. VARCHAR variables have a 2-byte length field followed by a 65533-byte string field. However, for VARCHAR array elements, the maximum length of the string field is 65530 bytes. When you specify the length of a VARCHAR variable, be sure to include 2 bytes for the length field. For longer strings, use the LONG VARCHAR datatype. In an EXEC SQL VAR statement, do not include the 2-byte length field.
VARCHAR2 The VARCHAR2 datatype represents variable-length character strings. On most platforms, the maximum length of a VARCHAR2 value is 65535 bytes. Specify the maximum length of a VARCHAR2(n) value in bytes, not characters. So, if a VARCHAR2(n) variable stores multibyte characters, its maximum length is less than n characters.
On Input. Oracle reads the number of bytes specified for the input host variable, strips any trailing blanks, and then stores the input value in the target database column. If the input value is longer than the defined width of the database column, Oracle generates an error. If the input value is all SPACES, Oracle treats it like a NULL. Oracle can convert a character value to a NUMBER column value if the character value represents a valid number. Otherwise, Oracle generates an error.
On Output. Oracle returns the number of bytes specified for the output host variable, blank-padding if necessary, and then assigns the output value to the target host variable. If a NULL is returned, Oracle fills the host variable with blanks.
If the output value is longer than the declared length of the host variable, Oracle truncates the value before assigning it to the host variable. If an indicator variable is available, Oracle sets it to the original length of the output value. Oracle can convert NUMBER column values to character values. The length of the character host variable determines precision. If the host variable is too short for the number, scientific notation is used. For example, if you select the column value 123456789 into a host variable of length 6, Oracle returns the value 1.2E08 to the host variable.
Thanks for the note, and yes, I'm not sure about VARCHAR2 vs. VARCHAR any more. From the notes above it seems that VARCHAR2 has (nowadays) 4 byte length, while VARCHAR has always fixed 2.
My potential confusion on this may root in https://docs.oracle.com/cd/B13789_01/appdev.101/a96109/pco14opt.htm with VARCHAR2 notes about trailing space part and VARCHAR about the group.
Thanks for testing, looking forward to see this in 1.0.80!
This is "almost" done. "Almost" means that the the feature per se works, but it casually exposed a bug in generation of string constants. I am trying to resolve that, than I will release the modified code to the GitHub repository.
pgsql://localhost/testdb1?magic_spaces=on
I'm confused - that would mean it is current a global setting, no? That sound very likely to break existing data in keyfelds as "keyfield " != "keyfield".
The whole point of EXEC SQL VAR
Is to make this a field-specific decision - mostly between CHARF
and VARCHAR
/VARCHAR2
. The is something I must miss.
Ok, I will look into it again
I will try to test with ProCOBOL but if there is no difference between VARCHAR and VARCHAR2, as I suspect, how do you suggest to proceed?
Just throwing ideas around here: I could implement a different behaviour for VARCHAR and VARCHAR2, but this would be absolutely non-standard, and VARCHAR2 is an Oracle-only thing anyway, since in EDB it is apparently just a synonym for VARCHAR. Since every preprocessor, for many reasons, does things a little bit differently, is there a way this could be implemented without using VARCHAR2, but instead using a different syntax?
Thanks
f there is no difference between VARCHAR and VARCHAR2, as I suspect, how do you suggest to proceed
Then just do the same, using it as Oracle synonym. The important difference is between VARCHAR and CHARF (with CHARF being the default). As noted before from potentially historical ProCOB docs: you'd want to test both with a PIC(X255) and with PIC X(25000) item (with a matching on the DB side, of course).
If I remember correctly there was a switch with procob to say "if the variable type is not explicit specified: change the default type from CHARF to VARCHAR (preprocessor switch, not connection option)". You may want to add something similar instead of the connection option.
Since every preprocessor, for many reasons, does things a little bit differently
Supporting the existing variants - if necessary with a new switch to gixsql - is a very useful goal. :-)
This gets even weirder. Oracle also says (look here, page 44):
VARCHAR
Datatype Do not use theVARCHAR
datatype. Use theVARCHAR2
datatype instead. Although theVARCHAR
datatype is currently synonymous withVARCHAR2
, theVARCHAR
datatype is scheduled to be redefined as a separate datatype used for variable-length character strings compared with different comparison semantics.
This is obviously not applicable to other DBMSs, including PostgreSQL, but explains why Oracle treats VARCHAR
differently and the documentation usually prefers VARCHAR2
.
In the meantime, while experimenting with ProCOBOL, I have found out the following things. They might just be details I was missing, but I am listing them anyway:
SQL TYPE IS
syntax is not supported, it is just passed along (at this point it is probably a Micro Focus-only construct)EXEC SQL VAR
declarations work as "specifiers", the variable they refer to must be declared in the COBOL code (i.e. they are not automatically generated by the preprocessor with the correct/inferred type)EXEC SQL VAR
declarations must come - in the source - after the variable they refer to, otherwise a syntax error is generated-LEN
/-ARR
) are generated only if the VARYING
attribute if present in the variable declaration. If there is no VARYING
attribute, even if an EXEC SQL VAR...IS VARCHAR
declaration is used, no variable group is generated and the variable is passed along as declared in the original sourceVARCHAR
and VARCHAR2
seems to be that the variable declared as VARCHAR
passes along 2 extra bytes (as length) but I am quite sure this is has to do with internal implementation details and is probably connected to the pre-8.0 ProCOBOL behaviour (PICX=CHARF/VARCHAR
). There seems to be no difference at the SQL level (e.g when querying the database, concatenating strings, etc.). As said, except for the length, there also does not seem to be anything (parameters, extra-calls, etc.) in the generated code to differentiate the two, implying that ProCOBOL's runtime cannot actually distinguish them, so I guess they can be treated the same way.EXEC SQL VAR...IS...
you must specify the length when using VARCHAR
but it is not mandatory with VARCHAR2
. I am not sure if and how this is relevant.
EXEC SQL VAR
declarations work as "specifiers", the variable they refer to must be declared in the COBOL code (i.e. they are not automatically generated by the preprocessor with the correct/inferred type)
BTW: this restriction (using EXEC SQL VAR declarations as specifiers and not autogenerating variable length groups from an EXEC SQL VAR
declaration) makes sense, otherwise there would be an ambiguity when using PICX=VARCHAR (i.e. “is this supposed to use a standard PIC(X) and auto-trim or a variable length group whose data size, when writing to the DB, should be determined by the -LEN field?”).
I think I will be heading this way (and of course adding to gixpp a switch similar to PICX=… .
EXEC SQL VAR
declarations must come - in the source - after the variable they refer to, otherwise a syntax error is generated
This restriction would make preprocessing easier and more linear, even if support for handling variables declared in the “wrong” order is already there. I have to check whether it is still needed.
Agreed - if it does make things easier and is necessary for at least one option/code path: go that way.
I am working on this but at the moment is unlikely it will see the light this week: unfortunately I need to test and re-test a lot of stuff and the pre-release I had in mind is probably going to slide, also depending on my day-job.
Either you think you can come up with a solution for that until Thursday or not (I'm fine with both) - I'm not available for testing until mid of August and VARCHAR2 would definitely need more time to tests (fixes for #88 could be tested on Thursday / Friday).
The version in the internal repository should now be working as defined in https://github.com/mridoni/gixsql/issues/21#issuecomment-1192359787.
gixpp
now has an added switch (picx-as=varchar|char|charf
) that can be used to decide how to handle non-varlen PIC(x)
fields when sending them to the DB driver.
What is open in this issue?
I see that there is no explicit handling of EXEC SQL VAR SOME-VAL IS VARCHAR2 END-EXEC.
in the runtime, only the "normal" varlength handling in https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/runtime/libgixsql/SqlVar.cpp#L261-L279
Does this mean that one still has both separate -ARR and -LEN fields with the manual setting of the length (presumably using FUNCTION STORED-CHAR-LENGTH
under GnuCOBOL) or can the bind variable SOME-VAL
be used directly and the length is calculated somewhere else?
I tried to impllement the same behavior as ProCOBOL (see my previous comment) so the VARYING
clause is needed to auto-generate the variable length groups (i.e. a top-level group with two level-49 sub-fields). EXEC SQL VAR
, like in ProCOBOL, should only supply a hint as to the type of the variable on the DBMS side (this behaviour can be modfied with the --picx-as
flag in gixpp
, more or less like it happens in procob
.
For some reasons I've not looked that well...
https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/runtime/libgixsql/SqlVar.cpp#L265-L268
I guess autotrim is set with IS VARCHAR2
, correct? If the answer is yes, then this works "as planned", if not then it should be.
For some reasons I've not looked that well...
I guess autotrim is set with
IS VARCHAR2
, correct? If the answer is yes, then this works "as planned", if not then it should be.
Yes, on the preprocessor side it is set here: https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/libgixpp/TPESQLProcessing.cpp#L1225-L1226
Some of this stuff will have to change to account for #125, that I have not started to work on. As I refactored the runtime libraries to use smart pointers, I am also trying to "C++ize" the interface and code a bit more, and trying to eliminate some dead/old/dangerous/ code) so it will be at least a week until I can start to work on it.
From the README it looks like VARCHAR2 (=like varchar but with auto-trim instead of manually calculating the length) is still not available, while the code above looks like it is.
I did not found a testcase with VARCHAR2. The changelog neither has VARCHAR nor trim in it, so I feel a bit lost what the state on that is (in general but most specific to postgresql).
Is there anything else open for EXEC SQL VAR
?
On other thing found to be open:
EXEC SQL VAR TABKEY1 IS CHARF END-EXEC.
raises
error: syntax error, unexpected TOKEN
maybe this can this be used at least at a syntactical level as an alias for CHAR
?
... but something is strange here:
EXEC SQL VAR BUFFER IS RAW (20200) END-EXEC.
raises (1.0.20 and 1.0.20b - both installed from tarball and showing the expected version):
error: syntax error, unexpected FLOAT, expecting END-EXEC
I guess that further adjustments will take a while, but as I've just seen that again... friendly ping on CHARF and "unexpected FLOAT".
Add EXEC SQL VAR statements - Example:
Currently this leads to a syntax error. Some details are noted in ProCOB docs.