mridoni / gixsql

GixSQL is an ESQL preprocessor and a series of runtime libraries to enable GnuCOBOL to access PostgreSQL, ODBC, MySQL, Oracle and SQLite databases.
GNU General Public License v3.0
14 stars 6 forks source link

FR: support EXEC SQL VAR #21

Open GitMensch opened 2 years ago

GitMensch commented 2 years ago

Add EXEC SQL VAR statements - Example:

           EXEC SQL VAR SOME-ID             IS CHAR      END-EXEC.

Currently this leads to a syntax error. Some details are noted in ProCOB docs.

GitMensch commented 2 years ago

Things that this will be useful for:

mridoni commented 2 years ago

Introduced in v1.0.8

GitMensch commented 2 years ago

Did not found the option to test that, but the parser looks good. One thing that I've recognized is that EXEC SQL VAR statements are not fully supported, its syntax according to the docs is.

     EXEC SQL 
         VAR <host_variable>
         IS <ext_type_name> [({<length> | <precision>,<scale>})]
     END-EXEC.

To fully support that (if in question then possibly with a warning "pending feature") the optional items need to be added - you already were so clever to do it in the lexer for the "length", the piece missing (not at all important for me) is the "precision, scale" format.

Also missing are the external types: FLOAT (COMP-1 and COMP-2, also not important for my use case), CHARF (likely the same as current CHAR, RAW (likely the same as VARBINARY) and VARCHAR2.

Note: for Oracle VARCHAR means the VARYING structure (with implicit -LEN and -ARR sub-fields) while VARCHAR2 means "a single field, automatically rtrimmed when passed to DB and space-padded when read). What does VARCHAR mean currently with GixSQL?

mridoni commented 2 years ago

To fully support that (if in question then possibly with a warning "pending feature") the optional items need to be added

Yes, I intend to do that

What does VARCHAR mean currently with GixSQL?

Currently it's the first one (implicit -LEN and -ARR sub-fields)

mridoni commented 2 years ago

To be more clear: the length parameter as it is written already supports passing precision and scale : they are encoded as 16 bits integers shifted into a 64 bit int which is passed between the various definitions in the parser. I just need to implement the types and test.

GitMensch commented 2 years ago

Currently it's the first one (implicit -LEN and -ARR sub-fields)

OK, that matches the Oracle definition. Please add VARCHAR2, too :-)

And: Big thanks for your work on GixSQL - I hope the time invested in testing, issue and PR creation is at least a slight motivation to go on.

mridoni commented 2 years ago

Of course it is.

I made some modifications in the parser: the cursor handling part was not really a work of art, but it was one of the first things I put together when I started fiddling with my modifications to ocesql 3 or 4 years ago (they were just that at the time) and I didn't know the then-current codebase, so I didn't want to change too much. I have reorganized that part and a lot more stuff in the parser/scanner, so it is now cleaner and more maintainable. This will obviously need a lot of testing to avoid regressions, and this is why I still cannot give a timeline for the next release, but we are getting there.

mridoni commented 2 years ago

Reopening to keep track of the modifications on EXEC SQL VAR and the implementation of VARCHAR2

mridoni commented 2 years ago

By the way: maybe I am missing something but Visual COBOL (at least v6.0) seems to ignore the EXEC SQL VAR declarations. They are parsed but no actual check seems to take place. In the following source snippet:

   WORKING-STORAGE SECTION. 

       01 NUM1 SQL TYPE IS VARBINARY(100).
       01 NUM2 SQL TYPE IS DATE.

       01 NUM3 PIC S9(8)V9(4).

   EXEC SQL VAR NUM3 IS xyz123 END-EXEC.

   EXEC SQL VAR NUM3 IS idontknow END-EXEC.

   EXEC SQL VAR NUM4 IS something END-EXEC.

the compiler doesn't catch three quite obvious errors:

As I said I might very well be missing something.

GitMensch commented 2 years ago

EXEC SQL is commonly handled by preprocessors, not by the COBOL compiler. Many preprocessors that don't know how to handle something (which may be EXEC SQL VAR in your case) see it as a comment. With VC (if this is what handles the statements) you may want to increase the default output, to check for informational messages; if I remember correctly by $SET VERBOSE WARNINGS (3) [or similar].

mridoni commented 2 years ago

Mostly done, except for VARCHAR2, still working on it (needs changes in the runtime libraries), but I already added support for FLOAT, INT and DECIMAL.

GitMensch commented 2 years ago

Started to check with some ProCOB targetted code, result:

error: Invalid level for SQL variable, it is 05, should be 01

This is a check that is not happening with ProCOB. Maybe it can be lowered to a warning instead?

If the 01/77 is needed for the CALL (per COBOL standard rules) one could just assume "that possibly works, if not we had a warning", or the generation could be changed so that using a non level-01 (note: level 77 should definitely be possible too!) items automatically generates a MOVE lvl-05-item TO tmp-01-level-05-item and vice versa before/after the CALL and the temporary item be automatically created, too.

GitMensch commented 2 years ago

OK, GixSQL has no issue with the bind parameters being level 05 either, so I think that error message should only - if at all - happen if it defines an implicit variable length structure - but even then this seems questionable.

Note: this is not a show-stopper for me currently, can also be handled "correctly" later if that would mean much work.

mridoni commented 2 years ago

error: Invalid level for SQL variable, it is 05, should be 01

This is a check that is not happening with ProCOB. Maybe it can be lowered to a warning instead?

This is currently more a limitation in the runtime library, that expects host variables to be a top-level group with level 01. I have to check in there to assess the impact. For context: the "level 01" assumption/limitation was ok for the code I was porting and allowed me to develop faster, but obviously it must be removed now .

GitMensch commented 2 years ago

This is currently more a limitation in the runtime library, that expects host variables to be a top-level group with level 01

That doesn't seem like the case, because the following (peudocode) works fine:

01 some-rec.
  05 some-value1 pic xxxx.
  05 some-value2 pic 99.

EXEC SQL
   SELECT A, B FROM X WHERE val1 = :some-value1 val2 = :some-value2
END-EXEC

... or I did not understand what you've said, also possible.

mridoni commented 2 years ago

... or I did not understand what you've said, also possible.

More likely: I don't remember it well, but I will look into it

GitMensch commented 2 years ago

@mridoni Any update for VARCHAR2 (doing explicit rtrim COBOL->DB and space-pad DB->COBOL)?

GitMensch commented 2 years ago

Leaving a ping on VARCHAR2 as this is the currently only known "missing piece" for full postgres support of some existing applications and being more than a minor PR (needs adjustments in both the preprocessor and the runtime [I think that won't be driver related]).

mridoni commented 2 years ago

Leaving a ping on VARCHAR2 as this is the currently only known "missing piece" for full postgres support of some existing applications and being more than a minor PR (needs adjustments in both the preprocessor and the runtime [I think that won't be driver related]).

That's the next item on my list for GixSQL. Just one thing: in the Oracle documentation I couldn't actually find a difference between VARCHAR and VARCHAR2; except for NULL handling (in Oracle empty string = NULL) they behave the same (including right-side trimming)

And from what I see PostgreSQL does not directly support VARCHAR2, this seems to be a feature in the (commercial) EnterpriseDB only: https://www.enterprisedb.com/docs/epas/latest/epas_compat_reference/02_the_sql_language/02_data_types/ where, by the way, VARCHAR2 seems to be just an alias for CHARACTER VARYING(n). Am I missing something?

GitMensch commented 2 years ago

I think that CHARACTER VARYING may also be implemented as auto-right trimmed in a preprocessor for PG.

For Oracle the difference in the preprocessor is, that VARCHAR2 is right-trimmed (see link in the opening post):

If [... the] columns are VARCHAR2, the program interface strips the trailing blanks on input and inserts just the 6-character string "MILLER" and the 5-character string "SALES" into the database. However, if the target database columns are CHAR, the strings are blank-padded to the width of the columns.

(and on reading in VARCHAR2 are blank padded again to be of the expected COBOL size and type).

mridoni commented 2 years ago

So this would basically mean:

Is that correct?

GitMensch commented 2 years ago

From my understanding that's correct for VARCHAR2, but defining a variable as VARCHAR in ProCOB leads to be identical to VARYING -> creating a group with ARR and LEN.

mridoni commented 2 years ago

Ok, I'll do a test with ProCOB (and report the results here): I have just re-read the documentation and, while it is very detailed on the behaviour of VARCHAR2, it is not very clear with regard to the differences between VARCHAR and VARCHAR2. From the way the documentation is formatted (see below, taken straight from page 4-7 of ProCOBOL for Oracle 11g Release 2 (11.2) E10826-01) what I gather is that the two paragraphs titled "Input" and "Output" apply to both VARCHAR and VARCHAR2, but the only way to be sure is to test it and see how (if) the generated code differs.


VARCHAR The VARCHAR datatype represents variable-length character strings. VARCHAR variables have a 2-byte length field followed by a 65533-byte string field. However, for VARCHAR array elements, the maximum length of the string field is 65530 bytes. When you specify the length of a VARCHAR variable, be sure to include 2 bytes for the length field. For longer strings, use the LONG VARCHAR datatype. In an EXEC SQL VAR statement, do not include the 2-byte length field.

VARCHAR2 The VARCHAR2 datatype represents variable-length character strings. On most platforms, the maximum length of a VARCHAR2 value is 65535 bytes. Specify the maximum length of a VARCHAR2(n) value in bytes, not characters. So, if a VARCHAR2(n) variable stores multibyte characters, its maximum length is less than n characters.

On Input. Oracle reads the number of bytes specified for the input host variable, strips any trailing blanks, and then stores the input value in the target database column. If the input value is longer than the defined width of the database column, Oracle generates an error. If the input value is all SPACES, Oracle treats it like a NULL. Oracle can convert a character value to a NUMBER column value if the character value represents a valid number. Otherwise, Oracle generates an error.

On Output. Oracle returns the number of bytes specified for the output host variable, blank-padding if necessary, and then assigns the output value to the target host variable. If a NULL is returned, Oracle fills the host variable with blanks.

If the output value is longer than the declared length of the host variable, Oracle truncates the value before assigning it to the host variable. If an indicator variable is available, Oracle sets it to the original length of the output value. Oracle can convert NUMBER column values to character values. The length of the character host variable determines precision. If the host variable is too short for the number, scientific notation is used. For example, if you select the column value 123456789 into a host variable of length 6, Oracle returns the value 1.2E08 to the host variable.

GitMensch commented 2 years ago

Thanks for the note, and yes, I'm not sure about VARCHAR2 vs. VARCHAR any more. From the notes above it seems that VARCHAR2 has (nowadays) 4 byte length, while VARCHAR has always fixed 2.

My potential confusion on this may root in https://docs.oracle.com/cd/B13789_01/appdev.101/a96109/pco14opt.htm with VARCHAR2 notes about trailing space part and VARCHAR about the group.

Thanks for testing, looking forward to see this in 1.0.80!

mridoni commented 2 years ago

This is "almost" done. "Almost" means that the the feature per se works, but it casually exposed a bug in generation of string constants. I am trying to resolve that, than I will release the modified code to the GitHub repository.

GitMensch commented 2 years ago

I'm confused - that would mean it is current a global setting, no? That sound very likely to break existing data in keyfelds as "keyfield " != "keyfield".

The whole point of EXEC SQL VAR Is to make this a field-specific decision - mostly between CHARF and VARCHAR/VARCHAR2. The is something I must miss.

mridoni commented 2 years ago

Ok, I will look into it again

mridoni commented 2 years ago

I will try to test with ProCOBOL but if there is no difference between VARCHAR and VARCHAR2, as I suspect, how do you suggest to proceed?

Just throwing ideas around here: I could implement a different behaviour for VARCHAR and VARCHAR2, but this would be absolutely non-standard, and VARCHAR2 is an Oracle-only thing anyway, since in EDB it is apparently just a synonym for VARCHAR. Since every preprocessor, for many reasons, does things a little bit differently, is there a way this could be implemented without using VARCHAR2, but instead using a different syntax?

Thanks

GitMensch commented 2 years ago

f there is no difference between VARCHAR and VARCHAR2, as I suspect, how do you suggest to proceed

Then just do the same, using it as Oracle synonym. The important difference is between VARCHAR and CHARF (with CHARF being the default). As noted before from potentially historical ProCOB docs: you'd want to test both with a PIC(X255) and with PIC X(25000) item (with a matching on the DB side, of course).

If I remember correctly there was a switch with procob to say "if the variable type is not explicit specified: change the default type from CHARF to VARCHAR (preprocessor switch, not connection option)". You may want to add something similar instead of the connection option.

Since every preprocessor, for many reasons, does things a little bit differently

Supporting the existing variants - if necessary with a new switch to gixsql - is a very useful goal. :-)

mridoni commented 2 years ago

This gets even weirder. Oracle also says (look here, page 44):

VARCHAR Datatype Do not use the VARCHAR datatype. Use the VARCHAR2 datatype instead. Although the VARCHAR datatype is currently synonymous with VARCHAR2, the VARCHAR datatype is scheduled to be redefined as a separate datatype used for variable-length character strings compared with different comparison semantics.

This is obviously not applicable to other DBMSs, including PostgreSQL, but explains why Oracle treats VARCHAR differently and the documentation usually prefers VARCHAR2.

In the meantime, while experimenting with ProCOBOL, I have found out the following things. They might just be details I was missing, but I am listing them anyway:

mridoni commented 2 years ago
  • EXEC SQL VAR declarations work as "specifiers", the variable they refer to must be declared in the COBOL code (i.e. they are not automatically generated by the preprocessor with the correct/inferred type)

BTW: this restriction (using EXEC SQL VAR declarations as specifiers and not autogenerating variable length groups from an EXEC SQL VAR declaration) makes sense, otherwise there would be an ambiguity when using PICX=VARCHAR (i.e. “is this supposed to use a standard PIC(X) and auto-trim or a variable length group whose data size, when writing to the DB, should be determined by the -LEN field?”).

I think I will be heading this way (and of course adding to gixpp a switch similar to PICX=… .

  • EXEC SQL VAR declarations must come - in the source - after the variable they refer to, otherwise a syntax error is generated

This restriction would make preprocessing easier and more linear, even if support for handling variables declared in the “wrong” order is already there. I have to check whether it is still needed.

GitMensch commented 2 years ago

Agreed - if it does make things easier and is necessary for at least one option/code path: go that way.

mridoni commented 2 years ago

I am working on this but at the moment is unlikely it will see the light this week: unfortunately I need to test and re-test a lot of stuff and the pre-release I had in mind is probably going to slide, also depending on my day-job.

GitMensch commented 2 years ago

88 is the most important and alone would be definitely enough for a pre-release version; but don't feel any pressure.

Either you think you can come up with a solution for that until Thursday or not (I'm fine with both) - I'm not available for testing until mid of August and VARCHAR2 would definitely need more time to tests (fixes for #88 could be tested on Thursday / Friday).

mridoni commented 2 years ago

The version in the internal repository should now be working as defined in https://github.com/mridoni/gixsql/issues/21#issuecomment-1192359787.

gixpp now has an added switch (picx-as=varchar|char|charf) that can be used to decide how to handle non-varlen PIC(x) fields when sending them to the DB driver.

GitMensch commented 1 year ago

What is open in this issue?

GitMensch commented 1 year ago

I see that there is no explicit handling of EXEC SQL VAR SOME-VAL IS VARCHAR2 END-EXEC. in the runtime, only the "normal" varlength handling in https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/runtime/libgixsql/SqlVar.cpp#L261-L279

Does this mean that one still has both separate -ARR and -LEN fields with the manual setting of the length (presumably using FUNCTION STORED-CHAR-LENGTH under GnuCOBOL) or can the bind variable SOME-VAL be used directly and the length is calculated somewhere else?

mridoni commented 1 year ago

I tried to impllement the same behavior as ProCOBOL (see my previous comment) so the VARYING clause is needed to auto-generate the variable length groups (i.e. a top-level group with two level-49 sub-fields). EXEC SQL VAR, like in ProCOBOL, should only supply a hint as to the type of the variable on the DBMS side (this behaviour can be modfied with the --picx-as flag in gixpp, more or less like it happens in procob.

GitMensch commented 1 year ago

For some reasons I've not looked that well... https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/runtime/libgixsql/SqlVar.cpp#L265-L268 I guess autotrim is set with IS VARCHAR2, correct? If the answer is yes, then this works "as planned", if not then it should be.

mridoni commented 1 year ago

For some reasons I've not looked that well...

https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/runtime/libgixsql/SqlVar.cpp#L265-L268

I guess autotrim is set with IS VARCHAR2, correct? If the answer is yes, then this works "as planned", if not then it should be.

Yes, on the preprocessor side it is set here: https://github.com/mridoni/gixsql/blob/debeeeabb9e740f89a03e325e69553ec5e47149f/libgixpp/TPESQLProcessing.cpp#L1225-L1226

Some of this stuff will have to change to account for #125, that I have not started to work on. As I refactored the runtime libraries to use smart pointers, I am also trying to "C++ize" the interface and code a bit more, and trying to eliminate some dead/old/dangerous/ code) so it will be at least a week until I can start to work on it.

GitMensch commented 1 year ago

From the README it looks like VARCHAR2 (=like varchar but with auto-trim instead of manually calculating the length) is still not available, while the code above looks like it is.

I did not found a testcase with VARCHAR2. The changelog neither has VARCHAR nor trim in it, so I feel a bit lost what the state on that is (in general but most specific to postgresql).

Is there anything else open for EXEC SQL VAR?

GitMensch commented 1 year ago

On other thing found to be open:

         EXEC SQL VAR TABKEY1            IS CHARF     END-EXEC.

raises

error: syntax error, unexpected TOKEN

maybe this can this be used at least at a syntactical level as an alias for CHAR?

GitMensch commented 1 year ago

... but something is strange here:

           EXEC SQL VAR BUFFER     IS RAW (20200) END-EXEC.

raises (1.0.20 and 1.0.20b - both installed from tarball and showing the expected version):

error: syntax error, unexpected FLOAT, expecting END-EXEC

GitMensch commented 1 year ago

I guess that further adjustments will take a while, but as I've just seen that again... friendly ping on CHARF and "unexpected FLOAT".