pgsql-io / benchmarksql

A TPC-C like test tool

29 stars 11 forks source link

Refactor #3

Open angoca opened 3 years ago

angoca commented 3 years ago

Hi, I have been interested in this project for many years, and I have used in different occasions. Currently, I have used this project for a class I give, to compare different RDBMS and test performance. I tried to use the version 5, but this requires many modifications in the code for each DB. I haven't tried version 6 yet, but I see the lack of documentation for extension to other RDBMS which makes this version very difficult to use for a student that is just learning database. I do not want them to port the application, I just want they focus in the database side.

I have done a big refactoring for version 4, which is available at: https://github.com/ECI-SGBD/BenchmarkSQL-4

I propose you I could do the same refactoring in version 6, in order to update some old dependencies this project. The changes I propose are:

Use Maven instead of Ant.
Include drivers in Maven's pom, instead of manual downloading o including in this code.
Use packages for all classes.
Use log4j v2, instead of log4j v1.
Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).
Organize files according Maven structure. This refers to resources directory, and a directory for each kind of file: r, python, Bash scripts, etc. This organize the run directory.
All documentation in one directory, separated from code or scripts.
Documentation for each RDBMS, not just one for all of them.
Format the code according Eclipse format, which is a standard one.
Organize imports.
Use PMD, findbugs and checkstyle for better code.
Create javadoc headers, in order to be capable of generating a documentation of the code.
Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.
Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.
- Documentation about the parameters that receives the application.
Scripts to run in Windows.
Scripts to run based on the Maven structure (target directory).
Extra parameters, like:
- schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.
- Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

If you agree with some of these changes, I could do them, and then I will create a pull request. In fact, I do not want to do a big refactoring in a program, and not being integrated into the master.

Please tell me which ones are you interested in.

wieck commented 3 years ago

On 3/28/21 3:50 PM, Andres Gomez Casanova wrote:

Hi, I have been interested in this project for many years, and I have used in different occasions.

Hi, thank you for your interest.

I haven't found much time to work on this project lately, but definitely want to come back and finish/improve things.

Currently, I have used this project for a class I give, to compare different RDBMS and test performance. I tried to use the version 5, but this requires many modifications in the code for each DB. I haven't tried version 6 yet, but I see the lack of documentation for extension to other RDBMS which makes this version very difficult to use for a student that is just learning database. I do not want them to port the application, I just want they focus in the database side.

Version 6 is a major rewrite of the main driver in that it separates the simulated users from the simulated application threads by use of terminal object queues. This allows BenchmarkSQL to actually measure the end user experienced system response time instead of the database transaction latency. So whatever we are going to do, it needs to be based on version 6.

It now scales very well to thousands of warehouses, using the proper number of simulated terminals as well as keying and thinking time delays, without overwhelming the benchmark driver itself or requiring tens of thousands of database connections to do that.

Extending version 6 to another DBMS is also completely different from previous versions. It sure could use more documentation though.

What I have been experimenting with last is wrapping BenchmarkSQL itself into a Flask based Web-UI and then packaging it all into a Docker container. That will make it very easy to deploy it for example into cloud environments.

A major remaining problem is the analysis of the per-transaction results and generating the graphs. Installing R in the Docker container bloats the image to about 1.5GB because it pulls in most of X11. And then R can easily run out of memory when trying to aggregate a longer benchmark run, like over a week or two with a decent server and nearly maxing it out. To improve that it might be necessary to collect those results in a pre-aggregated form and switch from R to numpy or something else.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

I'm very interested in Andres proposed modernizations.

On Mon, Mar 29, 2021 at 11:50 AM Jan Wieck @.***> wrote:

On 3/28/21 3:50 PM, Andres Gomez Casanova wrote:

Hi, I have been interested in this project for many years, and I have used in different occasions.

Hi, thank you for your interest.

I haven't found much time to work on this project lately, but definitely want to come back and finish/improve things.

Currently, I have used this project for a class I give, to compare different RDBMS and test performance. I tried to use the version 5, but this requires many modifications in the code for each DB. I haven't tried version 6 yet, but I see the lack of documentation for extension to other RDBMS which makes this version very difficult to use for a student that is just learning database. I do not want them to port the application, I just want they focus in the database side.

Version 6 is a major rewrite of the main driver in that it separates the simulated users from the simulated application threads by use of terminal object queues. This allows BenchmarkSQL to actually measure the end user experienced system response time instead of the database transaction latency. So whatever we are going to do, it needs to be based on version 6.

It now scales very well to thousands of warehouses, using the proper number of simulated terminals as well as keying and thinking time delays, without overwhelming the benchmark driver itself or requiring tens of thousands of database connections to do that.

Extending version 6 to another DBMS is also completely different from previous versions. It sure could use more documentation though.

What I have been experimenting with last is wrapping BenchmarkSQL itself into a Flask based Web-UI and then packaging it all into a Docker container. That will make it very easy to deploy it for example into cloud environments.

A major remaining problem is the analysis of the per-transaction results and generating the graphs. Installing R in the Docker container bloats the image to about 1.5GB because it pulls in most of X11. And then R can easily run out of memory when trying to aggregate a longer benchmark run, like over a week or two with a decent server and nearly maxing it out. To improve that it might be necessary to collect those results in a pre-aggregated form and switch from R to numpy or something else.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-809491654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHSRPRBVX57QY6P4RM3TGCOV7ANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 3/29/21 2:38 PM, Denis Lussier wrote:

I'm very interested in Andres proposed modernizations.

I am interested too. But any changes need to be based on current master, which isn't completely up to date yet. I may get to it this weekend and report back where on the V6 path we are.

Regards, Jan

On Mon, Mar 29, 2021 at 11:50 AM Jan Wieck @.***> wrote:

On 3/28/21 3:50 PM, Andres Gomez Casanova wrote:

Hi, I have been interested in this project for many years, and I have used in different occasions.

Hi, thank you for your interest.

I haven't found much time to work on this project lately, but definitely want to come back and finish/improve things.

Currently, I have used this project for a class I give, to compare different RDBMS and test performance. I tried to use the version 5, but this requires many modifications in the code for each DB. I haven't tried version 6 yet, but I see the lack of documentation for extension to other RDBMS which makes this version very difficult to use for a student that is just learning database. I do not want them to port the application, I just want they focus in the database side.

Version 6 is a major rewrite of the main driver in that it separates the simulated users from the simulated application threads by use of terminal object queues. This allows BenchmarkSQL to actually measure the end user experienced system response time instead of the database transaction latency. So whatever we are going to do, it needs to be based on version 6.

It now scales very well to thousands of warehouses, using the proper number of simulated terminals as well as keying and thinking time delays, without overwhelming the benchmark driver itself or requiring tens of thousands of database connections to do that.

Extending version 6 to another DBMS is also completely different from previous versions. It sure could use more documentation though.

What I have been experimenting with last is wrapping BenchmarkSQL itself into a Flask based Web-UI and then packaging it all into a Docker container. That will make it very easy to deploy it for example into cloud environments.

A major remaining problem is the analysis of the per-transaction results and generating the graphs. Installing R in the Docker container bloats the image to about 1.5GB because it pulls in most of X11. And then R can easily run out of memory when trying to aggregate a longer benchmark run, like over a week or two with a decent server and nearly maxing it out. To improve that it might be necessary to collect those results in a pre-aggregated form and switch from R to numpy or something else.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-809491654, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAMWOHSRPRBVX57QY6P4RM3TGCOV7ANCNFSM4Z6LRPBQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-809617678, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMQNAGVWNYMRHYX243DTGDCKNANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

wieck commented 3 years ago

On 3/30/21 10:07 AM, Jan Wieck wrote:

On 3/29/21 2:38 PM, Denis Lussier wrote:

I'm very interested in Andres proposed modernizations.

I am interested too. But any changes need to be based on current master, which isn't completely up to date yet. I may get to it this weekend and report back where on the V6 path we are.

I checked the repositories against what I have locally. All is in sync. The current status in the master branch is:

The new V6 benchmark driver is fully functional and measures end user experienced SUT response times correctly.
The benchmark can be used stand alone (the old fashioned way) as well as inside the POC Flask UI.
The POC Flask UI can be used inside and outside of Docker.

That means we are good with respect to functionality for now. There are plenty of work items left to complete V6:

Replace the report generator with a matplotlib implementation. This will shrink the Docker image in half!
Make sure that the new report generator is capable of handling a couple hundred GB of transaction result data.
Implement an actual Flask UI. What is there now is really just a proof of concept. We need a real implementation with menus and server/configuration management.
Document how to use all of that with/without Flask/Docker.

The only critical one I see is the report generator. So I'm going to have a look at the matplotlib stuff. The main problem with the R based report is that it needs to load the same CSV file into memory over and over. And those result CSV files get really big. I can generate a 9GB result CSV on my laptop in just a day. Let that benchmark run against a serious server for a week or so and you need a machine, larger than the DB server, to process the result data.

If we can't get the "real" Flask UI done for V6, I'm fine with going with the current POC. It works, although it is ugly. Just like me.

All that said, I think our overall plan should be:

1) Get V6 into a shape that really scales to large servers and long run times. Disk space should not be a problem, memory is. 2) Get V6 released (requires doc work). 2.5) Breathe 3) Restructure and do the improvements, that Andres proposed. 4) Release all that (and whatever else we can do) as V7. 5) Rinse, repeat.

Did I miss anything?

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

We are excited to accept code modernizations for v6 from the community. Also, Jan is being shy... he is a handsome man.

On Tue, Mar 30, 2021 at 10:35 PM Jan Wieck @.***> wrote:

On 3/30/21 10:07 AM, Jan Wieck wrote:

On 3/29/21 2:38 PM, Denis Lussier wrote:

I'm very interested in Andres proposed modernizations.

I am interested too. But any changes need to be based on current master, which isn't completely up to date yet. I may get to it this weekend and report back where on the V6 path we are.

I checked the repositories against what I have locally. All is in sync. The current status in the master branch is:

The new V6 benchmark driver is fully functional and measures end user experienced SUT response times correctly.

The benchmark can be used stand alone (the old fashioned way) as well as inside the POC Flask UI.

The POC Flask UI can be used inside and outside of Docker.

That means we are good with respect to functionality for now. There are plenty of work items left to complete V6:

Replace the report generator with a matplotlib implementation. This will shrink the Docker image in half!

Make sure that the new report generator is capable of handling a couple hundred GB of transaction result data.

Implement an actual Flask UI. What is there now is really just a proof of concept. We need a real implementation with menus and server/configuration management.

Document how to use all of that with/without Flask/Docker.

The only critical one I see is the report generator. So I'm going to have a look at the matplotlib stuff. The main problem with the R based report is that it needs to load the same CSV file into memory over and over. And those result CSV files get really big. I can generate a 9GB result CSV on my laptop in just a day. Let that benchmark run against a serious server for a week or so and you need a machine, larger than the DB server, to process the result data.

If we can't get the "real" Flask UI done for V6, I'm fine with going with the current POC. It works, although it is ugly. Just like me.

All that said, I think our overall plan should be:

1) Get V6 into a shape that really scales to large servers and long run times. Disk space should not be a problem, memory is. 2) Get V6 released (requires doc work). 2.5) Breathe 3) Restructure and do the improvements, that Andres proposed. 4) Release all that (and whatever else we can do) as V7. 5) Rinse, repeat.

Did I miss anything?

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-810712735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHXSJBMUYZVLVUEWQJ3TGKDADANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/1/21 9:02 AM, Denis Lussier wrote:

We are excited to accept code modernizations for v6 from the community.

There is one more very important issue.

The new Oracle drivers actually have deprecated the Oracle ARRAY, which is why we currently need the driver at compile time for Oracle stored procedure support. The recommendation is to convert everything to use the generic java.sql.Array interface instead. This is a must do for V6 because it will allow us to ship a prebuilt Docker container and the user then can supply the vendor specific JDBC drivers in an external directory at runtime. There will be more like that. V6 currently can also runs the generic code (no stored procs yet) against Microsoft SQL Server.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

angoca commented 3 years ago

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.
The shell scripts will be modified to support Maven structure in the target directory instead of the run directory.
Libraries like drivers will be automatically downloaded and included in the target directory by Maven.
- All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.
The imports will be organized.
Packages will be used for all classes.

You can clone the pull request into a new branch, or locally, test what I propose, and if you agree, then perform the Pull request.

Once, I have done that, the basic structure will be fine to do these improvements progressively.

Use of log4j v2.
Configuration of log4j with packages.
Scripts to run in Windows.
Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).
Documentation for each RDBMS, not just one for all of them.
Modification of HOW TO RUN files to comply with Markdown syntax.
Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.
Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.
- In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.
Create javadoc headers, in order to be capable of generating a documentation of the code.
Documentation about the parameters that receives the application.
Extra parameters, like:
- schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.
- Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.
- Use PMD, findbugs and checkstyle for better code.
- Use of Travis-CI to do Continuous Integration.
- Generation of a GitHub web page, based on the files under doc.
- Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support?
- Java 1.7 is not longer maintained, but very popular.
- Java 8 is still maintained and very popular.
- Any older one?
- Any newer, like java 9 with Lambda expressions?

luss commented 3 years ago

Sounds good to me. We just want to be sure that the pull request you submit are to https://github.com/pgsql-io/banchmark on the Master branch.

--Luss

On Thu, Apr 1, 2021 at 11:55 AM Andres Gomez Casanova < @.***> wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the target directory instead of the run directory.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

You can clone the pull request into a new branch, or locally, test what I propose, and if you agree, then perform the Pull request.

Once, I have done that, the basic structure will be fine to do these improvements progressively.

Use of log4j v2.

Configuration of log4j with packages.

Scripts to run in Windows.

Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).

Documentation for each RDBMS, not just one for all of them.

Modification of HOW TO RUN files to comply with Markdown syntax.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like:

schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.

Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

-

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support?

Java 1.7 is not longer maintained, but very popular.

Java 8 is still maintained and very popular.

Any older one?

Any newer, like java 9 with Lambda expressions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812004011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHRQ22CDN77UPPJHE7TTGSJQVANCNFSM4Z6LRPBQ .

luss commented 3 years ago

Personally, I think we should support JDK 8, and JDK 11. Those are the two LTS releases AFAIK.

On Thu, Apr 1, 2021 at 12:05 PM Denis Lussier @.***> wrote:

Sounds good to me. We just want to be sure that the pull request you submit are to https://github.com/pgsql-io/banchmark on the Master branch.

--Luss

On Thu, Apr 1, 2021 at 11:55 AM Andres Gomez Casanova < @.***> wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the target directory instead of the run directory.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

You can clone the pull request into a new branch, or locally, test what I propose, and if you agree, then perform the Pull request.

Once, I have done that, the basic structure will be fine to do these improvements progressively.

Use of log4j v2.

Configuration of log4j with packages.

Scripts to run in Windows.

Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).

Documentation for each RDBMS, not just one for all of them.

Modification of HOW TO RUN files to comply with Markdown syntax.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like:

schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.

Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

-

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support?

Java 1.7 is not longer maintained, but very popular.

Java 8 is still maintained and very popular.

Any older one?

Any newer, like java 9 with Lambda expressions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812004011, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHRQ22CDN77UPPJHE7TTGSJQVANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/1/21 12:06 PM, Denis Lussier wrote:

Personally, I think we should support JDK 8, and JDK 11. Those are the two LTS releases AFAIK.

I am currently using java-1.8.0-openjdk to build and run V6 (master). I have not tried 11 yet. Will do that once I get my new server.

Regards, Jan

On Thu, Apr 1, 2021 at 12:05 PM Denis Lussier @.***> wrote:

Sounds good to me. We just want to be sure that the pull request you submit are to https://github.com/pgsql-io/banchmark on the Master branch.

--Luss

On Thu, Apr 1, 2021 at 11:55 AM Andres Gomez Casanova < @.***> wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the target directory instead of the run directory.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

You can clone the pull request into a new branch, or locally, test what I propose, and if you agree, then perform the Pull request.

Once, I have done that, the basic structure will be fine to do these improvements progressively.

Use of log4j v2.

Configuration of log4j with packages.

Scripts to run in Windows.

Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).

Documentation for each RDBMS, not just one for all of them.

Modification of HOW TO RUN files to comply with Markdown syntax.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like:

schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.

Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

-

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support?

Java 1.7 is not longer maintained, but very popular.

Java 8 is still maintained and very popular.

Any older one?

Any newer, like java 9 with Lambda expressions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812004011, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAMWOHRQ22CDN77UPPJHE7TTGSJQVANCNFSM4Z6LRPBQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812010720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMWHDUCMNFF43KXQ3L3TGSKY7ANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

wieck commented 3 years ago

I suggest you make yourself familiar with the current master branch of benchmarksql. There have been substantial changes to some of what you are touching below. More comments inline.

On 4/1/21 11:55 AM, Andres Gomez Casanova wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the |target| directory instead of the |run| directory.

New directory structure is fine.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

This I am very interested in. I was looking at the Maven Repository at https://repo1.maven.org/maven2 earlier and noticed that the Oracle and Microsoft JDBC drivers are there. Oracle even points to that repository on their download page. Which means that Oracle has changed their policy and no longer requires license click through for download. Bravo!

I still don't think we should ship them or add copies of them to our repository, but if they are automatically downloaded at build time and if a user has an easy way to get them into a local lib directory to run a pre-built Docker container, that would be perfect.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

We definitely need to make sure that the changes do not interfere with the support for Oracle, MariaDB or MSSQL-server. Currently I am testing Oracle against an Oracle-11g instance in a virtual machine, MariaDB in a different virtual maching and MSSQL-server with a Docker container based on mcr.microsoft.com/mssql/server, which is MSSQL-2017. I can share the Dockerfile for that.

We can forget about Firebird. It can't handle concurrency well because its concurrency model doesn't follow the SQL Standard. I will eventually drop that again anyway.

These are of course only functional tests. None of those database installations can handle any meaningful load.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

Those changes will definitely help students to find their way around the code easier.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

The really RDBMS specific stuff for stored procedures is in separate files now in the ./application directory.

The AppGeneric driver, which implements all transactions inside the application with preparedStatment()s, there is exactly one single query left that is RDBMS vendor specific. And that is the STOCK_LEVEL query because Oracle doesn't like AS after a subselect while other databases either tolerate or require it.

Which means that adopting support for a new vendor requires writing a new ./application/AppStoredProc.java (if stored procedure support is being implemented) and touching that one spot in the AppGeneric.java file.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

Don't start on documenting the R stuff. I intend to replace the usage of R with matplotlib and basically rewrite the entire report generation as well as the result data capture from scratch.

But I agree on using the Wiki.

  o In the wiki, there could be a page per RDBMS to explain the
    details and points of performance improvement.
Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema. o Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

The current master implementation does not use schemas. Everything in PostgreSQL for example is int the "public" schema. All tables have a bmsql_ prefix. It also doesn't use GO or the like since the SQL files for building and dropping the tables/procedures are processed by a Java utility program.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

*
It is a good practice to have packages for all classes. I propose to
use the following package name base, that easily reference the class
to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

BenchmarkSQL has moved the main repository before. It used to be on bitbucket and then moved here. I don't know how long term stable the pgsql-io part the pgsql-io part of the above will be. Other than that I am for it.

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support? o Java 1.7 is not longer maintained, but very popular. o Java 8 is still maintained and very popular.

I'm still using 8, but should test against 11 once I get a few more resources (mostly time).

  o Any older one?
  o Any newer, like java 9 with Lambda expressions?

I wouldn't want to require a newer one by using features, that aren't in Java 8.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

in my (limited) experience, 11 is almost totally upwardly compatible with 8, but... there are a few deprecations that cause various compile time warnings. To the degree that benchmark sql ships with class files already made, we'd do that with JDK8 so the warnings wouldn't be a short term issue until version of JDK after 11.

--Luss

On Thu, Apr 1, 2021 at 12:51 PM Jan Wieck @.***> wrote:

I suggest you make yourself familiar with the current master branch of benchmarksql. There have been substantial changes to some of what you are touching below. More comments inline.

On 4/1/21 11:55 AM, Andres Gomez Casanova wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the |target| directory instead of the |run| directory.

New directory structure is fine.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

This I am very interested in. I was looking at the Maven Repository at https://repo1.maven.org/maven2 earlier and noticed that the Oracle and Microsoft JDBC drivers are there. Oracle even points to that repository on their download page. Which means that Oracle has changed their policy and no longer requires license click through for download. Bravo!

I still don't think we should ship them or add copies of them to our repository, but if they are automatically downloaded at build time and if a user has an easy way to get them into a local lib directory to run a pre-built Docker container, that would be perfect.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

We definitely need to make sure that the changes do not interfere with the support for Oracle, MariaDB or MSSQL-server. Currently I am testing Oracle against an Oracle-11g instance in a virtual machine, MariaDB in a different virtual maching and MSSQL-server with a Docker container based on mcr.microsoft.com/mssql/server, which is MSSQL-2017. I can share the Dockerfile for that.

We can forget about Firebird. It can't handle concurrency well because its concurrency model doesn't follow the SQL Standard. I will eventually drop that again anyway.

These are of course only functional tests. None of those database installations can handle any meaningful load.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

Those changes will definitely help students to find their way around the code easier.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

The really RDBMS specific stuff for stored procedures is in separate files now in the ./application directory.

The AppGeneric driver, which implements all transactions inside the application with preparedStatment()s, there is exactly one single query left that is RDBMS vendor specific. And that is the STOCK_LEVEL query because Oracle doesn't like AS after a subselect while other databases either tolerate or require it.

Which means that adopting support for a new vendor requires writing a new ./application/AppStoredProc.java (if stored procedure support is being implemented) and touching that one spot in the AppGeneric.java file.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

Don't start on documenting the R stuff. I intend to replace the usage of R with matplotlib and basically rewrite the entire report generation as well as the result data capture from scratch.

But I agree on using the Wiki.

o In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema. o Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

The current master implementation does not use schemas. Everything in PostgreSQL for example is int the "public" schema. All tables have a bmsql_ prefix. It also doesn't use GO or the like since the SQL files for building and dropping the tables/procedures are processed by a Java utility program.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

*

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

BenchmarkSQL has moved the main repository before. It used to be on bitbucket and then moved here. I don't know how long term stable the pgsql-io part the pgsql-io part of the above will be. Other than that I am for it.

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support? o Java 1.7 is not longer maintained, but very popular. o Java 8 is still maintained and very popular.

I'm still using 8, but should test against 11 once I get a few more resources (mostly time).

o Any older one? o Any newer, like java 9 with Lambda expressions?

I wouldn't want to require a newer one by using features, that aren't in Java 8.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812037901, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHSPGZKPARESLUQKKETTGSQCXANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/1/21 1:52 PM, Denis Lussier wrote:

in my (limited) experience, 11 is almost totally upwardly compatible with 8, but... there are a few deprecations that cause various compile time warnings. To the degree that benchmark sql ships with class files already made, we'd do that with JDK8 so the warnings wouldn't be a short term issue until version of JDK after 11.

I just ran into another problem with 8 while trying to use the ojdbc10 driver:

Exception in thread "main" java.lang.UnsupportedClassVersionError: oracle/jdbc/OracleDriver has been compiled by a more recent version of the Java Runtime (class file version 54.0), this version of the Java Runtime only recognizes class file versions up to 52.0

So I guess we will have to move to something newer than java-1.8.

Regards, Jan

--Luss

On Thu, Apr 1, 2021 at 12:51 PM Jan Wieck @.***> wrote:

I suggest you make yourself familiar with the current master branch of benchmarksql. There have been substantial changes to some of what you are touching below. More comments inline.

On 4/1/21 11:55 AM, Andres Gomez Casanova wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the |target| directory instead of the |run| directory.

New directory structure is fine.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

This I am very interested in. I was looking at the Maven Repository at https://repo1.maven.org/maven2 earlier and noticed that the Oracle and Microsoft JDBC drivers are there. Oracle even points to that repository on their download page. Which means that Oracle has changed their policy and no longer requires license click through for download. Bravo!

I still don't think we should ship them or add copies of them to our repository, but if they are automatically downloaded at build time and if a user has an easy way to get them into a local lib directory to run a pre-built Docker container, that would be perfect.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

We definitely need to make sure that the changes do not interfere with the support for Oracle, MariaDB or MSSQL-server. Currently I am testing Oracle against an Oracle-11g instance in a virtual machine, MariaDB in a different virtual maching and MSSQL-server with a Docker container based on mcr.microsoft.com/mssql/server, which is MSSQL-2017. I can share the Dockerfile for that.

We can forget about Firebird. It can't handle concurrency well because its concurrency model doesn't follow the SQL Standard. I will eventually drop that again anyway.

These are of course only functional tests. None of those database installations can handle any meaningful load.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

Those changes will definitely help students to find their way around the code easier.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

The really RDBMS specific stuff for stored procedures is in separate files now in the ./application directory.

The AppGeneric driver, which implements all transactions inside the application with preparedStatment()s, there is exactly one single query left that is RDBMS vendor specific. And that is the STOCK_LEVEL query because Oracle doesn't like AS after a subselect while other databases either tolerate or require it.

Which means that adopting support for a new vendor requires writing a new ./application/AppStoredProc.java (if stored procedure support is being implemented) and touching that one spot in the AppGeneric.java file.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

Don't start on documenting the R stuff. I intend to replace the usage of R with matplotlib and basically rewrite the entire report generation as well as the result data capture from scratch.

But I agree on using the Wiki.

o In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema. o Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

The current master implementation does not use schemas. Everything in PostgreSQL for example is int the "public" schema. All tables have a bmsql_ prefix. It also doesn't use GO or the like since the SQL files for building and dropping the tables/procedures are processed by a Java utility program.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

*

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

BenchmarkSQL has moved the main repository before. It used to be on bitbucket and then moved here. I don't know how long term stable the pgsql-io part the pgsql-io part of the above will be. Other than that I am for it.

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support? o Java 1.7 is not longer maintained, but very popular. o Java 8 is still maintained and very popular.

I'm still using 8, but should test against 11 once I get a few more resources (mostly time).

o Any older one? o Any newer, like java 9 with Lambda expressions?

I wouldn't want to require a newer one by using features, that aren't in Java 8.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812037901, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAMWOHSPGZKPARESLUQKKETTGSQCXANCNFSM4Z6LRPBQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812071290, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMTGTSPPVCGAWHSZ3QTTGSXGFANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

11 it is then. It's been out for a couple years. If you look at the versions of JDK available in newer os versions (such as centos-8 and ubuntu-20.04), it is both OpenJDK 8 and OpenJDK 11. Seems like every three years Oracle and the community are supporting LTS releases.

On Thu, Apr 1, 2021 at 2:10 PM Jan Wieck @.***> wrote:

On 4/1/21 1:52 PM, Denis Lussier wrote:

in my (limited) experience, 11 is almost totally upwardly compatible with 8, but... there are a few deprecations that cause various compile time warnings. To the degree that benchmark sql ships with class files already made, we'd do that with JDK8 so the warnings wouldn't be a short term issue until version of JDK after 11.

I just ran into another problem with 8 while trying to use the ojdbc10 driver:

Exception in thread "main" java.lang.UnsupportedClassVersionError: oracle/jdbc/OracleDriver has been compiled by a more recent version of the Java Runtime (class file version 54.0), this version of the Java Runtime only recognizes class file versions up to 52.0

So I guess we will have to move to something newer than java-1.8.

Regards, Jan

--Luss

On Thu, Apr 1, 2021 at 12:51 PM Jan Wieck @.***> wrote:

I suggest you make yourself familiar with the current master branch of benchmarksql. There have been substantial changes to some of what you are touching below. More comments inline.

On 4/1/21 11:55 AM, Andres Gomez Casanova wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the |target| directory instead of the |run| directory.

New directory structure is fine.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

This I am very interested in. I was looking at the Maven Repository at https://repo1.maven.org/maven2 earlier and noticed that the Oracle and Microsoft JDBC drivers are there. Oracle even points to that repository on their download page. Which means that Oracle has changed their policy and no longer requires license click through for download. Bravo!

I still don't think we should ship them or add copies of them to our repository, but if they are automatically downloaded at build time and if a user has an easy way to get them into a local lib directory to run a pre-built Docker container, that would be perfect.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

We definitely need to make sure that the changes do not interfere with the support for Oracle, MariaDB or MSSQL-server. Currently I am testing Oracle against an Oracle-11g instance in a virtual machine, MariaDB in a different virtual maching and MSSQL-server with a Docker container based on mcr.microsoft.com/mssql/server, which is MSSQL-2017. I can share the Dockerfile for that.

We can forget about Firebird. It can't handle concurrency well because its concurrency model doesn't follow the SQL Standard. I will eventually drop that again anyway.

These are of course only functional tests. None of those database installations can handle any meaningful load.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

Those changes will definitely help students to find their way around the code easier.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

The really RDBMS specific stuff for stored procedures is in separate files now in the ./application directory.

The AppGeneric driver, which implements all transactions inside the application with preparedStatment()s, there is exactly one single query left that is RDBMS vendor specific. And that is the STOCK_LEVEL query because Oracle doesn't like AS after a subselect while other databases either tolerate or require it.

Which means that adopting support for a new vendor requires writing a new ./application/AppStoredProc.java (if stored procedure support is being implemented) and touching that one spot in the AppGeneric.java file.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

Don't start on documenting the R stuff. I intend to replace the usage of R with matplotlib and basically rewrite the entire report generation as well as the result data capture from scratch.

But I agree on using the Wiki.

o In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema. o Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

The current master implementation does not use schemas. Everything in PostgreSQL for example is int the "public" schema. All tables have a bmsql_ prefix. It also doesn't use GO or the like since the SQL files for building and dropping the tables/procedures are processed by a Java utility program.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

*

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

BenchmarkSQL has moved the main repository before. It used to be on bitbucket and then moved here. I don't know how long term stable the pgsql-io part the pgsql-io part of the above will be. Other than that I am for it.

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support? o Java 1.7 is not longer maintained, but very popular. o Java 8 is still maintained and very popular.

I'm still using 8, but should test against 11 once I get a few more resources (mostly time).

o Any older one? o Any newer, like java 9 with Lambda expressions?

I wouldn't want to require a newer one by using features, that aren't in Java 8.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

< https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812037901>, or unsubscribe

< https://github.com/notifications/unsubscribe-auth/AAMWOHSPGZKPARESLUQKKETTGSQCXANCNFSM4Z6LRPBQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812071290>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACYRMTGTSPPVCGAWHSZ3QTTGSXGFANCNFSM4Z6LRPBQ .

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812081536, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHT2IDBJ6ZD22BE65O3TGSZLFANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/1/21 2:44 PM, Denis Lussier wrote:

11 it is then. It's been out for a couple years. If you look at the versions of JDK available in newer os versions (such as centos-8 and ubuntu-20.04), it is both OpenJDK 8 and OpenJDK 11. Seems like every three years Oracle and the community are supporting LTS releases.

Unfortunately Oracle is retarded. Even with the newest drivers we have:

"Oracle JDBC does not support the JDBC 4.0 method createArrayOf method of java.sql.Connection interface. This method only allows anonymous array types, while all Oracle array types are named. Use the Oracle specific method oracle.jdbc.OracleConnection.createARRAY instead."

So it seems impossible to compile Oracle Stored Procedure support without having the Oracle driver present.

It might be possible to build with Oracle support, not include the Oracle driver in the Docker image, but then put it back into place at runtime.

Regards, Jan

On Thu, Apr 1, 2021 at 2:10 PM Jan Wieck @.***> wrote:

On 4/1/21 1:52 PM, Denis Lussier wrote:

in my (limited) experience, 11 is almost totally upwardly compatible with 8, but... there are a few deprecations that cause various compile time warnings. To the degree that benchmark sql ships with class files already made, we'd do that with JDK8 so the warnings wouldn't be a short term issue until version of JDK after 11.

I just ran into another problem with 8 while trying to use the ojdbc10 driver:

Exception in thread "main" java.lang.UnsupportedClassVersionError: oracle/jdbc/OracleDriver has been compiled by a more recent version of the Java Runtime (class file version 54.0), this version of the Java Runtime only recognizes class file versions up to 52.0

So I guess we will have to move to something newer than java-1.8.

Regards, Jan

--Luss

On Thu, Apr 1, 2021 at 12:51 PM Jan Wieck @.***> wrote:

I suggest you make yourself familiar with the current master branch of benchmarksql. There have been substantial changes to some of what you are touching below. More comments inline.

On 4/1/21 11:55 AM, Andres Gomez Casanova wrote:

Hi guys,

The code modernizations that I propose do not change the current functionality, and this won't correct or add new functionality. It just will use new frameworks that eases the deployment. The changes I propose will be done in parts, in order to have a tracking of the evolution; not just a big change. The important thing to take into account with this refactoring is that the firsts "pull request" will have a lot of visual changes:

The files will be moved to other directories to comply with Maven structure.

The shell scripts will be modified to support Maven structure in the |target| directory instead of the |run| directory.

New directory structure is fine.

Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

This I am very interested in. I was looking at the Maven Repository at https://repo1.maven.org/maven2 earlier and noticed that the Oracle and Microsoft JDBC drivers are there. Oracle even points to that repository on their download page. Which means that Oracle has changed their policy and no longer requires license click through for download. Bravo!

I still don't think we should ship them or add copies of them to our repository, but if they are automatically downloaded at build time and if a user has an easy way to get them into a local lib directory to run a pre-built Docker container, that would be perfect.

All documentation in one directory, separating it from code or scripts.

I will adjust the scripts to work with Postgres; currently I cannot test against Oracle.

We definitely need to make sure that the changes do not interfere with the support for Oracle, MariaDB or MSSQL-server. Currently I am testing Oracle against an Oracle-11g instance in a virtual machine, MariaDB in a different virtual maching and MSSQL-server with a Docker container based on mcr.microsoft.com/mssql/server, which is MSSQL-2017. I can share the Dockerfile for that.

We can forget about Firebird. It can't handle concurrency well because its concurrency model doesn't follow the SQL Standard. I will eventually drop that again anyway.

These are of course only functional tests. None of those database installations can handle any meaningful load.

Once files are moved, then the entire code will be adjusted.

The code will be formatted to Eclipse syntax format.

The imports will be organized.

Packages will be used for all classes.

Those changes will definitely help students to find their way around the code easier.

Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.

The really RDBMS specific stuff for stored procedures is in separate files now in the ./application directory.

The AppGeneric driver, which implements all transactions inside the application with preparedStatment()s, there is exactly one single query left that is RDBMS vendor specific. And that is the STOCK_LEVEL query because Oracle doesn't like AS after a subselect while other databases either tolerate or require it.

Which means that adopting support for a new vendor requires writing a new ./application/AppStoredProc.java (if stored procedure support is being implemented) and touching that one spot in the AppGeneric.java file.

Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.

Don't start on documenting the R stuff. I intend to replace the usage of R with matplotlib and basically rewrite the entire report generation as well as the result data capture from scratch.

But I agree on using the Wiki.

o In the wiki, there could be a page per RDBMS to explain the details and points of performance improvement.

Create javadoc headers, in order to be capable of generating a documentation of the code.

Documentation about the parameters that receives the application.

Extra parameters, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema. o Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

The current master implementation does not use schemas. Everything in PostgreSQL for example is int the "public" schema. All tables have a bmsql_ prefix. It also doesn't use GO or the like since the SQL files for building and dropping the tables/procedures are processed by a Java utility program.

Use PMD, findbugs and checkstyle for better code.

Use of Travis-CI to do Continuous Integration.

Generation of a GitHub web page, based on the files under doc.

Include support to other RDBMS, starting with Db2, and then extending to others.

Finally, I have theses questions, I need your answer to proceed:

*

It is a good practice to have packages for all classes. I propose to use the following package name base, that easily reference the class to the hosting facility of the code:

com.github.pgsql-io.benchmarksql

BenchmarkSQL has moved the main repository before. It used to be on bitbucket and then moved here. I don't know how long term stable the pgsql-io part the pgsql-io part of the above will be. Other than that I am for it.

Under this package, all classes will be placed under different sub packages (jtpcc, loader, etc.)

Which minimum java version BenchmarkSQL should support? o Java 1.7 is not longer maintained, but very popular. o Java 8 is still maintained and very popular.

I'm still using 8, but should test against 11 once I get a few more resources (mostly time).

o Any older one? o Any newer, like java 9 with Lambda expressions?

I wouldn't want to require a newer one by using features, that aren't in Java 8.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

<

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812037901>,

or unsubscribe

<

https://github.com/notifications/unsubscribe-auth/AAMWOHSPGZKPARESLUQKKETTGSQCXANCNFSM4Z6LRPBQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812071290>,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AACYRMTGTSPPVCGAWHSZ3QTTGSXGFANCNFSM4Z6LRPBQ

.

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812081536, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AAMWOHT2IDBJ6ZD22BE65O3TGSZLFANCNFSM4Z6LRPBQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812099749, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMS6YUSBNFEIQ46ZK53TGS5HVANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

angoca commented 3 years ago

Ok, I did the first pull request with the Maven stuff. In order to run it, you just need to execute:

mvn
cd target/run ; 
./runDatabaseBuild.sh my_postgres.properties
./runBenchmark.sh my_postgres.properties
cd ../.. ;

This will compile java, create jar and copy extra files in the target directory. Once there, you could run the application.

In the partial commits I put extra comments to describe the changes I did.

This was the first part. Once you check it, and integrate it to the master, I will proceed with the second part.

wieck commented 3 years ago

On 4/1/21 5:00 PM, Andres Gomez Casanova wrote:

Ok, I did the first pull request with the Maven stuff. In order to run it, you just need to execute:

|mvn cd target/run ; ./runDatabaseBuild.sh my_postgres.properties cd ../.. ; |

This did not succeed on first try as it apparently requires Maven >= 3.1 and CentOS 7 only supplies 3.0. CentOS 8 seems to come with Maven 3.5, but it will take me a few cycles to get there.

Thanks so far, I will be in touch shortly.

Regards, Jan

This will compile java, create jar and copy extra files in the target directory. Once there, you could run the application.

In the partial commits I put extra comments to describe the changes I did.

This was the first part. Once you check it, and integrate it to the master, I will proceed with the second part.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812169604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMT6TYFP5HNNBCYPC4DTGTNF7ANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

:-)

On Thu, Apr 1, 2021 at 5:31 PM Jan Wieck @.***> wrote:

On 4/1/21 5:00 PM, Andres Gomez Casanova wrote:

Ok, I did the first pull request with the Maven stuff. In order to run it, you just need to execute:

|mvn cd target/run ; ./runDatabaseBuild.sh my_postgres.properties cd ../.. ; |

This did not succeed on first try as it apparently requires Maven >= 3.1 and CentOS 7 only supplies 3.0. CentOS 8 seems to come with Maven 3.5, but it will take me a few cycles to get there.

Thanks so far, I will be in touch shortly.

Regards, Jan

This will compile java, create jar and copy extra files in the target directory. Once there, you could run the application.

In the partial commits I put extra comments to describe the changes I did.

This was the first part. Once you check it, and integrate it to the master, I will proceed with the second part.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812169604>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AACYRMT6TYFP5HNNBCYPC4DTGTNF7ANCNFSM4Z6LRPBQ .

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812183703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHXC6MUAQF7YFGMN2JDTGTQ2JANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/1/21 5:30 PM, Jan Wieck wrote:

On 4/1/21 5:00 PM, Andres Gomez Casanova wrote:

Ok, I did the first pull request with the Maven stuff. In order to run it, you just need to execute:

|mvn cd target/run ; ./runDatabaseBuild.sh my_postgres.properties cd ../.. ; |

This did not succeed on first try as it apparently requires Maven >= 3.1 and CentOS 7 only supplies 3.0. CentOS 8 seems to come with Maven 3.5, but it will take me a few cycles to get there.

Good news:

After a full upgrade to CentOS 8 (which was long overdue anyway) I was able to build this branch and perform a DB build and benchmark run:

13:39:26.869 [Thread-0] INFO jTPCCScheduler : Scheduler, ready 13:39:56.892 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=104 NOPM=28 13:40:26.869 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=254 NOPM=88 13:40:26.870 [Thread-0] INFO jTPCCScheduler : Scheduler, all simulated terminals active 13:40:26.870 [Thread-0] INFO jTPCCScheduler : Scheduler, all SUT threads active 13:40:56.870 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=270 NOPM=126 13:41:26.868 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=306 NOPM=130 13:41:26.868 [Thread-0] INFO jTPCCScheduler : Scheduler, rampup done - measurement begins 13:41:56.872 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=324 NOPM=130 13:42:26.870 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=288 NOPM=118 13:42:56.872 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=288 NOPM=120 13:43:26.872 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=266 NOPM=128 13:43:56.870 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=288 NOPM=132 13:44:26.872 [Thread-0] INFO jTPCCScheduler : Scheduler, Current TPM=290 NOPM=108 13:44:26.872 [Thread-0] INFO jTPCCScheduler : Scheduler, run done - measurement ends 13:44:36.871 [Thread-0] INFO jTPCCScheduler : Scheduler, done 13:44:36.872 [main] INFO jTPCC : main, scheduler returned 13:44:36.873 [main] INFO jTPCC : main, all simulated terminals ended 13:44:36.881 [main] INFO jTPCC : main, all SUT threads ended 13:44:36.882 [main] INFO jTPCCMonkey : result, latency (seconds) 13:44:36.882 [main] INFO jTPCCMonkey : result, TransType count | mix % | mean max 90th% | rbk% errors 13:44:36.882 [main] INFO jTPCCMonkey : result, +--------------+---------------+---------+---------+---------+---------+---------+---------------+ 13:44:36.885 [main] INFO jTPCCMonkey : result, | NEW_ORDER | 368 | 42.202 | 0.033 | 0.062 | 0.053 | 0.543 | 0 | 13:44:36.886 [main] INFO jTPCCMonkey : result, | PAYMENT | 389 | 44.610 | 0.014 | 0.025 | 0.020 | 0.000 | 0 | 13:44:36.886 [main] INFO jTPCCMonkey : result, | ORDER_STATUS | 42 | 4.817 | 0.011 | 0.017 | 0.015 | 0.000 | 0 | 13:44:36.886 [main] INFO jTPCCMonkey : result, | STOCK_LEVEL | 42 | 4.817 | 0.009 | 0.017 | 0.013 | 0.000 | 0 | 13:44:36.887 [main] INFO jTPCCMonkey : result, | DELIVERY | 31 | 3.555 | 0.002 | 0.005 | 0.004 | 0.000 | 0 | 13:44:36.887 [main] INFO jTPCCMonkey : result, | DELIVERY_BG | 31 | 0.000 | 0.047 | 0.073 | 0.065 | 0.000 | 0 | 13:44:36.887 [main] INFO jTPCCMonkey : result, +--------------+---------------+---------+---------+---------+---------+---------+---------------+ 13:44:36.887 [main] INFO jTPCCMonkey : result, 13:44:36.888 [main] INFO jTPCCMonkey : result, Overall NOPM: 123 (95.39% of the theoretical maximum) 13:44:36.888 [main] INFO jTPCCMonkey : result, Overall TPM: 291

Still need to test the generation of the html report.

I'll also have to test and probably adjust a few things for the Docker image creation. Which on CentOS 8 has of course changed entirely because they use Podman instead.

Anyhow, this all looks good so far. I'll have to get familiar with Maven, but it looks straight forward enough.

Thanks, Jan

Thanks so far, I will be in touch shortly.

Regards, Jan

This will compile java, create jar and copy extra files in the target directory. Once there, you could run the application.

In the partial commits I put extra comments to describe the changes I did.

This was the first part. Once you check it, and integrate it to the master, I will proceed with the second part.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812169604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACYRMT6TYFP5HNNBCYPC4DTGTNF7ANCNFSM4Z6LRPBQ.

-- Jan Wieck Principle Database Engineer Amazon Web Services

wieck commented 3 years ago

On 4/2/21 1:53 PM, Jan Wieck wrote:

Still need to test the generation of the html report.

Did another run with the 'resultDirectory=' parameter set. Running

./generateReport.sh <resultDirectory>

after that produced the attached HTML output. It will look better if the run is a bit longer than 3 minutes, but overall, report generation still works. You need to have R-core installed on the system (as said, I want to replace R with matplotlib anyway, but for now this is what we have).

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

wieck commented 3 years ago

Andres,

looking at the overall directory structure I think the FlaskService needs to be actually installed under 'target' during the build process. This thing is a browser based GUI meant to start/stop the run* scripts and display the HTML report files renerated by them.

Regards, Jan

On 4/2/21 2:18 PM, Jan Wieck wrote:

On 4/2/21 1:53 PM, Jan Wieck wrote:

Still need to test the generation of the html report.

Did another run with the 'resultDirectory=' parameter set. Running

./generateReport.sh

after that produced the attached HTML output. It will look better if the run is a bit longer than 3 minutes, but overall, report generation still works. You need to have R-core installed on the system (as said, I want to replace R with matplotlib anyway, but for now this is what we have).

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

Great stuff guys. Long live BenchmarkSQL!! :-)

On Fri, Apr 2, 2021 at 2:30 PM Jan Wieck @.***> wrote:

Andres,

looking at the overall directory structure I think the FlaskService needs to be actually installed under 'target' during the build process. This thing is a browser based GUI meant to start/stop the run* scripts and display the HTML report files renerated by them.

Regards, Jan

On 4/2/21 2:18 PM, Jan Wieck wrote:

On 4/2/21 1:53 PM, Jan Wieck wrote:

Still need to test the generation of the html report.

Did another run with the 'resultDirectory=' parameter set. Running

./generateReport.sh

after that produced the attached HTML output. It will look better if the run is a bit longer than 3 minutes, but overall, report generation still works. You need to have R-core installed on the system (as said, I want to replace R with matplotlib anyway, but for now this is what we have).

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-812655509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHUFTLH5V3T3MZFHEATTGYEM5ANCNFSM4Z6LRPBQ .

angoca commented 3 years ago

Hi Jan and Denis,

I moved Flask to resources, in order to be included in target directory.

wieck commented 3 years ago

On 4/5/21 9:17 PM, Andres Gomez Casanova wrote:

Hi Jan and Denis,

I moved Flask to resources, in order to be included in target directory.

That wasn't exactly what I had intended. Sorry for the misunderstanding.

I already moved them in the temporary branch pgsql-io/angoca-maven and then pushed a few more fixes to get Docker working again on top of that to the same branch. The idea was that you check out that branch and just confirm that those moves are OK with the Maven directory layout.

Do you have https://github.com/pgsql-io/benchmarksql.git set up as a secondary remote for your local repository (the one on your development machine, not the one on github)? If not, please do

 git remote add pgsql-io https://github.com/pgsql-io/benchmarksql.git
 git fetch --all

After that you can nuke your local "master" branch and create a new one by checking out pgsql-io/master.

I have merged pgsql-io/angoca-maven into master including the changes, that I made on top. So pgsql-io/master is now building with Maven, can run stand alone, under the Flask POC as well as inside of Docker.

Thanks and sorry for the confusion, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

angoca commented 3 years ago

Hello guys, I created this ToDo list, in order to show the advancement of my proposals. With the Maven refactor, it is much more easy to do these changes.

Maven

[x] The files will be moved to other directories to comply with Maven structure.
- Organize files according Maven structure. This refers to resources directory, and a directory for each kind of file: r, python, Bash scripts, etc. This organize the run directory.
[x] The shell scripts will be modified to support Maven structure in the target directory instead of the run directory.
[x] Libraries like drivers will be automatically downloaded and included in the target directory by Maven.

Code

[x] Use packages for all classes.
[x] The code will be formatted to Eclipse syntax format.
[x] Organize imports. Not star, but specifying each one.
[ ] Use PMD, findbugs and checkstyle for better code.
[ ] Create javadoc headers, in order to be capable of generating a documentation of the code.

Logger

[x] Use log4j v2, instead of log4j v1.
[x] Configuration of log4j with packages.
[x] Change System.output to loggers, for all classes (the 3 main methods: sql, local and benchmark).

Documentation

[x] All documentation in one directory, separating it from code or scripts.
[x] Documentation for each RDBMS, not just one for all of them.
[ ] Put a FIXME or TODO, in each part of the code where an extension should be done for other RDBMS.
[x] Modification of HOW TO RUN files to comply with Markdown syntax.

Wiki

[ ] Use Wiki, to explain things about this project. How to use the python script, generation of the diagrams with r, etc.
- Documentation about the parameters that receives the application.
- There could be a page per RDBMS to explain the details and points of performance improvement.

Scripts

[ ] Scripts to run in Windows.
[x] Extra parameters in the properties file, like:
- schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use benchmarksql schema.
- Statement terminator, when ; is not available. This could use another one like 'GO' or '@', or even EOL.

Automation (CI and page generation)

[ ] Use of Travis-CI to do Continuous Integration.
[ ] Generation of a GitHub web page, based on another branch.

RDBMS

[ ] Include support for Db2

wieck commented 3 years ago

I agree with most of this, but have a few questions and comments (inline).

On 4/6/21 12:50 PM, Andres Gomez Casanova wrote:

Hello guys, I created this ToDo list, in order to show the advancement of my proposals. With the Maven refactor, it is much more easy to do these changes.

Code

Use packages for all classes.

The code will be formatted to Eclipse syntax format.

I agree that the current Wild West formatting of code is bad and that we need to reformat it. However, the only actual rules I was able to find was the Google Java Style Guide:

 https://google.github.io/styleguide/javaguide.html

Eclipse seems to just allow a lot of different options and plugins. Can you point to an actual style guide?

Scripts to run in Windows.

While I see this as useful to getting someone's feet wet, it really has no practical purpose for running a real benchmark. When I am trying to put pressure onto an AWS RDS instance like a db.r4.8xlarge (16 core, 32 vcpu, 244 GiB memory), I need a benchmark driver running in the same AZ on an EC2 instance with enough network bandwidth. That naturally is an EC2 instance running Amazon Linux, not Windows.

Nobody should ever run a "benchmark" on their laptop over WiFi.

I am not against making it all work on Windows, I just wonder what the benefit of doing so might be.

Extra parameters in the properties file, like: o schema, which is useful for databases like MySQL that do not use schemas. Or for other databases which uses default schema. This do not force to use |benchmarksql| schema.

As said before, we no longer use schemas as if version 5. All tables are created without a schema but with a "bmsql_" prefix.

  o Statement terminator, when ; is not available. This could use
    another one like 'GO' or '@', or even EOL.

Can you elaborate on this a bit? In version 5 we changed loading the DDL scripts to use a JDBC based utility, which is in .../jdbc/ExecJDBC.java. It is using

 -- {
 <arbitraty-stuff>
 -- }

to identify blocks of that needs to be sent in one statement, regardless of whatever might be in it, including but not limited to ';'.

Best Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

angoca commented 3 years ago

Regarding your comments:

Code Style

Well, I know there are several code styles, and some of them have the associated XML to automatically format the code in an IDE like Eclipse. Normally, I use Eclipse bundled format, but there are other available:

And this is the way to import it https://www.planetofbits.com/eclipse/create-share-eclipse-code-style-formatter/

It seems, many people uses Google style. What do you think?

Scripts for Windows

I think we cannot forget that Windows server is a very popular platform, and that most of the time SQL server runs in that platform. Thus, we have to provide a way to run from there.

Extra parameters

I agree with you. I haven't seen the details of the new options in version 5 and 6, because I used to work with version 4. So probably, we can ignore these extra parameters.

wieck commented 3 years ago

On 4/7/21 10:05 AM, Andres Gomez Casanova wrote:

Regarding your comments:

Code Style

Well, I know there are several code styles, and some of them have the associated XML to automatically format the code in an IDE like Eclipse. Normally, I use Eclipse bundled format, but there are other available:

Google - https://github.com/google/styleguide/blob/96f6a64d30a47b09f4be98d83a3e30d624febf86/eclipse-java-google-style.xml https://github.com/google/styleguide/blob/96f6a64d30a47b09f4be98d83a3e30d624febf86/eclipse-java-google-style.xml

Spring - https://github.com/spring-projects/spring-batch/blob/master/spring-eclipse-code-conventions.xml https://github.com/spring-projects/spring-batch/blob/master/spring-eclipse-code-conventions.xml

And this is the way to import it https://www.planetofbits.com/eclipse/create-share-eclipse-code-style-formatter/ https://www.planetofbits.com/eclipse/create-share-eclipse-code-style-formatter/

It seems, many people uses Google style. What do you think?

The Google style is well documented and not everyone uses Eclipse. So I would prefer Google.

Scripts for Windows

I think we cannot forget that Windows server is a very popular platform, and that most of the time SQL server runs in that platform. Thus, we have to provide a way to run from there.

The benchmark program itself should never run on the database server. It needs resources like CPU, memory and write IO generate the transaction input, process the returned DB output and result collection, which affects the performance of the database if it runs on the same machine. Likewise an overloaded database could starve benchmarksql out of resources, which then completely invalidates the results. Accurate measurement of transaction latency is only possible if the benchmark program has enough dedicated resources.

We should encourage users to not run benchmarksql on the DB server itself.

That said I am not against adding script support to run benchmarksql on Windows. But with the ability of easily running a Podman container using WSL2, this isn't very high on my priority list.

Extra parameters

I agree with you. I haven't seen the details of the new options in version 5 and 6, because I used to work with version 4. So probably, we can ignore these extra parameters.

We will add/remove parameters as new/removed functionality requires.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

I think that for users learning about databases and benchmarks it is very valid to run on the same machine &/or windows.

Jan, I think u are focused on running valid large scale benchmarks (mostly) against Postgres. Windows is NOT a good choice here. Also, I think it is OK that our advanced graphing & metrics framework will only run on Linux.

--Luss

On Wed, Apr 7, 2021 at 11:09 AM Jan Wieck @.***> wrote:

On 4/7/21 10:05 AM, Andres Gomez Casanova wrote:

Regarding your comments:

Code Style

Well, I know there are several code styles, and some of them have the associated XML to automatically format the code in an IDE like Eclipse. Normally, I use Eclipse bundled format, but there are other available:

Google -

https://github.com/google/styleguide/blob/96f6a64d30a47b09f4be98d83a3e30d624febf86/eclipse-java-google-style.xml < https://github.com/google/styleguide/blob/96f6a64d30a47b09f4be98d83a3e30d624febf86/eclipse-java-google-style.xml

Spring -

https://github.com/spring-projects/spring-batch/blob/master/spring-eclipse-code-conventions.xml < https://github.com/spring-projects/spring-batch/blob/master/spring-eclipse-code-conventions.xml

And this is the way to import it

https://www.planetofbits.com/eclipse/create-share-eclipse-code-style-formatter/ < https://www.planetofbits.com/eclipse/create-share-eclipse-code-style-formatter/

It seems, many people uses Google style. What do you think?

The Google style is well documented and not everyone uses Eclipse. So I would prefer Google.

Scripts for Windows

I think we cannot forget that Windows server is a very popular platform, and that most of the time SQL server runs in that platform. Thus, we have to provide a way to run from there.

The benchmark program itself should never run on the database server. It needs resources like CPU, memory and write IO generate the transaction input, process the returned DB output and result collection, which affects the performance of the database if it runs on the same machine. Likewise an overloaded database could starve benchmarksql out of resources, which then completely invalidates the results. Accurate measurement of transaction latency is only possible if the benchmark program has enough dedicated resources.

We should encourage users to not run benchmarksql on the DB server itself.

That said I am not against adding script support to run benchmarksql on Windows. But with the ability of easily running a Podman container using WSL2, this isn't very high on my priority list.

Extra parameters

I agree with you. I haven't seen the details of the new options in version 5 and 6, because I used to work with version 4. So probably, we can ignore these extra parameters.

We will add/remove parameters as new/removed functionality requires.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-814993145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHWN2LEXU2GPWUVSCMLTHRYSBANCNFSM4Z6LRPBQ .

wieck commented 3 years ago

On 4/7/21 12:20 PM, Denis Lussier wrote:

I think that for users learning about databases and benchmarks it is very valid to run on the same machine &/or windows.

When just starting out and getting familiar with this, that is very true. As said, I am not against making it work on Windows. As long as we know precisely why we are doing that and try to educate the user that "this isn't how you evaluate a production scale system".

Jan, I think u are focused on running valid large scale benchmarks (mostly) against Postgres. Windows is NOT a good choice here.

Those two goals are not mutually exclusive. And I am not focused on PostgreSQL at all. We definitely need to come back to adding stored procedure support for MSSQL, MySQL and apparently soon DB2. But this needs to be done by someone, familiar with those database. Otherwise the stored procedure code will look like it was written by someone, who only knows PostgreSQL and Oracle.

Also, I think it is OK that our advanced graphing & metrics framework will only run on Linux.

This I disagree with.

If we make benchmarksql work on Windows, then many new users, who just want to see what it does, will not see those graphs and metrics. They will get the wrong impression.

This is one reason why I am rewriting all this graphing and metric reporting in Python with numpy and matplotlib. That will make it more portable.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

Kewl. It is great (and better) that the graphing and metric reporting will be cross platform.

I fully understand that BenchmarkSQL is not Postgres specific and does not "favor" Postgres in any way. You run the Benchmark, look at the results, and make your own conclusions. I will point out that the Postgres database runs slower on Windows than on Linux. AFAIK it's because Postgres uses a multi process rather than a multithreaded backend. In Linux it generally doesn't matter, but Windoze does threading more efficiently than multi-process.

On Wed, Apr 7, 2021 at 12:36 PM Jan Wieck @.***> wrote:

This I disagree with.

If we make benchmarksql work on Windows, then many new users, who just want to see what it does, will not see those graphs and metrics. They will get the wrong impression.

This is one reason why I am rewriting all this graphing and metric reporting in Python with numpy and matplotlib. That will make it more portable.

wieck commented 3 years ago

On 4/7/21 1:22 PM, Denis Lussier wrote:

Kewl. It is great (and better) that the graphing and metric reporting will be cross platform.

I am half way through changing how metrics are collected. And I know how to create some nice looking graphs with matplotlib.pyplot.

I fully understand that BenchmarkSQL is not Postgres specific and does not "favor" Postgres in any way. You run the Benchmark, look at the results, and make your own conclusions. I will point out that the Postgres database runs slower on Windows than on Linux. AFAIK it's because Postgres uses a multi process rather than a multithreaded backend. In Linux it generally doesn't matter, but Windoze does threading more efficiently than multi-process.

Yes, and that is perfectly fair. You only get what you pay for. If you must run your database on Windoze, you can run MSSQL and pay for that license, or you can buy more hardware and run PostgreSQL with the same performance. I really don't care, as long as the numbers produced by benchmarksql are accurate.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

luss commented 3 years ago

What I'm interested in is how SQL Server for Linux compares to SQl Server for Windoze. I doubt if Microsoft allows their proprietary database to run better on Linux. ;-)

On Wed, Apr 7, 2021 at 1:35 PM Jan Wieck @.***> wrote:

On 4/7/21 1:22 PM, Denis Lussier wrote:

Kewl. It is great (and better) that the graphing and metric reporting will be cross platform.

I am half way through changing how metrics are collected. And I know how to create some nice looking graphs with matplotlib.pyplot.

I fully understand that BenchmarkSQL is not Postgres specific and does not "favor" Postgres in any way. You run the Benchmark, look at the results, and make your own conclusions. I will point out that the Postgres database runs slower on Windows than on Linux. AFAIK it's because Postgres uses a multi process rather than a multithreaded backend. In Linux it generally doesn't matter, but Windoze does threading more efficiently than multi-process.

Yes, and that is perfectly fair. You only get what you pay for. If you must run your database on Windoze, you can run MSSQL and pay for that license, or you can buy more hardware and run PostgreSQL with the same performance. I really don't care, as long as the numbers produced by benchmarksql are accurate.

Regards, Jan

-- Jan Wieck Principle Database Engineer Amazon Web Services

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pgsql-io/benchmarksql/issues/3#issuecomment-815096124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHV3ORKJVF3HTCC5L63THSJVBANCNFSM4Z6LRPBQ .