vertical-blank / sql-formatter

SQL formatter written with only Java Standard Library, without dependencies.
MIT License
227 stars 47 forks source link

Format Merge #19

Closed numeralnathan closed 3 years ago

numeralnathan commented 4 years ago

Please enhance the library to format this SQL.

merge into DW_STG_USER.ACCOUNT_DIM target using (
   select
      COMMON_NAME m_commonName,
      ORIGIN m_origin,
      USAGE_TYPE m_usageType,
      CATEGORY m_category
   from
      MY_TABLE
   where
      USAGE_TYPE = :value
) source on source.m_usageType = target.USAGE_TYPE when matched then
update
set
   target.COMMON_NAME = source.m_commonName,
   target.ORIGIN = source.m_origin,
   target.USAGE_TYPE = source.m_usageType,
   target.CATEGORY = source.m_category
where
   (
      (source.m_commonName <> target.COMMON_NAME)
      or(source.m_origin <> target.ORIGIN)
      or(source.m_usageType <> target.USAGE_TYPE)
      or(source.m_category <> target.CATEGORY)
   )
   when not matched then
insert
   (
      target.COMMON_NAME,
      target.ORIGIN,
      target.USAGE_TYPE,
      target.CATEGORY
   )
values
   (
      source.m_commonName,
      source.m_origin,
      source.m_usageType,
      source.m_category
   )
manticore-projects commented 3 years ago

Hi,

for my own curiosity I ran it against my own manticore JSQLFormatter. I think your statement is INVALID as you are missing the BRACKETS around the ON-Expression. Adding the brackets would format the MERGE statement correctly:

-- EXTERNAL
MERGE INTO DW_STG_USER.ACCOUNT_DIM target
    USING ( SELECT COMMON_NAME m_commonName
                , ORIGIN m_origin
                , USAGE_TYPE m_usageType
                , CATEGORY m_category
            FROM MY_TABLE
            WHERE USAGE_TYPE = :value ) source
        ON ( source.m_usageType = target.USAGE_TYPE ) 
WHEN NOT MATCHED THEN 
    INSERT ( target.COMMON_NAME
                , target.ORIGIN
                , target.USAGE_TYPE
                , target.CATEGORY ) 
    VALUES ( source.m_commonName
                , source.m_origin
                , source.m_usageType
                , source.m_category ) 
WHEN MATCHED THEN 
    UPDATE SET target.COMMON_NAME = source.m_commonName
                    , target.ORIGIN = source.m_origin
                    , target.USAGE_TYPE = source.m_usageType
                    , target.CATEGORY = source.m_category
    WHERE ( ( source.m_commonName <> target.COMMON_NAME )
                    OR ( source.m_origin <> target.ORIGIN )
                    OR ( source.m_usageType <> target.USAGE_TYPE )
                    OR ( source.m_category <> target.CATEGORY ) )
;
vertical-blank commented 3 years ago

This seems to be fixed on current version. Thanks @manticore-projects !

manticore-projects commented 3 years ago

Simple statements look better now, but there are still a lot of issues with complex ones, e. g.

-- MERGE 4
MERGE INTO cfe.instrument_import_measure imp
    USING ( WITH x AS (
                    SELECT  a.id_instrument
                            , a.id_currency
                            , a.id_instrument_type
                            , b.id_portfolio
                            , c.attribute_value product_code
                            , t.valid_date
                            , t.yield
                    FROM cfe.instrument a
                        INNER JOIN cfe.impairment b
                            ON a.id_instrument = b.id_instrument
                        LEFT JOIN cfe.instrument_attribute c
                            ON a.id_instrument = c.id_instrument
                                AND c.id_attribute = 'product'
                        INNER JOIN cfe.ext_yield t
                            ON ( a.id_currency = t.id_currency )
                                AND ( a.id_instrument_type LIKE t.id_instrument_type )
                                AND ( b.id_portfolio LIKE t.id_portfolio
                                        OR ( b.id_portfolio IS NULL
                                                AND t.id_portfolio = '%' ) )
                                AND ( c.attribute_value LIKE t.product_code
                                        OR ( c.attribute_value IS NULL
                                                AND t.product_code = '%' ) ) )
SELECT /*+ PARALLEL */ *
            FROM x x1
            WHERE x1.valid_date = ( SELECT max
                                    FROM x
                                    WHERE id_instrument = x1.id_instrument
                                        AND valid_date <= to_date ) ) s
        ON ( imp.id_instrument = s.id_instrument
                    AND imp.measure = 'YIELD_PP' )
WHEN MATCHED THEN
    UPDATE SET  imp.value = s.yield
;

You can have a look at https://github.com/manticore-projects/jsqlformatter and also benchmark it against the interactive DEMO. Maybe the provided samples and test cases will be helpful.

manticore-projects commented 3 years ago

Btw, how do you plan to deal with comments? Example

-- BOTH CLAUSES PRESENT 'with a string' AND "a field"
MERGE /*+ PARALLEL */ INTO test1 /*the target table*/ a
    USING all_objects      /*the source table*/
        ON ( /*joins in()!*/ a.object_id = b.object_id )
-- INSERT CLAUSE 
WHEN /*comments between keywords!*/ NOT MATCHED THEN
    INSERT ( object_id     /*ID Column*/
                , status   /*Status Column*/ )
    VALUES ( b.object_id
                , b.status )
/* UPDATE CLAUSE
WITH A WHERE CONDITION */ 
WHEN MATCHED THEN          /* Lets rock */
    UPDATE SET  a.status = '/*this is no comment!*/ and -- this ain''t either'
    WHERE   b."--status" != 'VALID'
;

Had a really hard time to figure that out. So maybe we should just join our efforts and work together? One 90% solution would be better than 2 different 75% solution in my opinion.

vertical-blank commented 3 years ago

That's a good idea, but I think the advantage of this library is having no dependencies, as described in readme. And about formatting, currently, my priority is to reproduce the behavior of original library, https://github.com/zeroturnaround/sql-formatter. But, there are a few bugs or issues in the original version, so I'm now wondering if I should improve it independently...

manticore-projects commented 3 years ago

Greetings!

On Thu, 2021-05-06 at 07:22 -0700, yohei yamana wrote:

That's a good idea, but I think the advantage of this library is having no dependencies, as described in readme.

I get your point, but you will start to re-invent the wheel by writing your own parser while there is already a parser available, which is matured and can be bundled easily. Even that matured JSQLParser had a few gaps I had to fix over the last few weeks and there are still more gaps to fix. Developing all this logic from scratch will take a few man years.

Another concern is speed: The REGEX will struggle with very long SQL statements (e.g. formatting a file of insert statements).

And about formatting, currently, my priority is to reproduce the behavior of original library, https://github.com/zeroturnaround/sql-formatter.

Can't argue about that. In contrast I need a reliable Formatter for quite complex SQL statements and I haven't found anything good. Still a long way to go, but a lot of stuff is working quite well already.

But, there are a few bugs or issues in the original version, so I'm now wondering if I should improve it independently...

That's how I started JSQLFormatter. I actually have had a look at your project first in January and found that it failed on the MERGE and WITH statements. So I checked the Tokens and REGEXs and while I was working on that stuff, I figured that I was actually working on the parser, not even on the formatter yet. This triggered my decision to restart from scratch based on JSQLParser.

Anyways, nice to stay in touch!  I would love to exchange ideas and solutions in the future. All the best and cheers!

Andreas

vertical-blank commented 3 years ago

I almost completely agree with you, there is no doubt that JSQLFormatter is more reliable at formatting. But, why I ported this is the original is very simple implementation by using REGEX.

And now, there is an issue about using external SQL parser created by maintainer of original. https://github.com/zeroturnaround/sql-formatter/issues/117 As mentioned in this, to make pluggable the use of external parser might be the better solution, I think.

Thank you very much too.