python / pyperformance

Python Performance Benchmark Suite
http://pyperformance.readthedocs.io/
MIT License
870 stars 175 forks source link

add sqlglot benchmarks #221

Closed tobymao closed 2 years ago

tobymao commented 2 years ago

sqlglot is a pure python sql parser, transpiler, and optimizer

tobymao commented 2 years ago

i can add that in the future. execution is still in alpha

tobymao commented 2 years ago

@mdboom do you need anything from me in order to land these changes?

mdboom commented 2 years ago

@mdboom do you need anything from me in order to land these changes?

This looks fine to me, but I don't have merge rights. Maybe @ericsnowcurrently can have a look.

ericsnowcurrently commented 2 years ago

FWIW, there are some extra things to consider as we work on building good benchmark suites:

Relative to this benchmark specifically:

(I'm sure we'll merge it in regardless of the answers.)

ericsnowcurrently commented 2 years ago

Another thing to consider is that the sqlglot project should probably have this benchmark as part of its own suite (in its own repo), regardless of its inclusion in the pyperformance suite.

tobymao commented 2 years ago

@ericsnowcurrently ready for another look.

and sure, i can add these benchmarks to the own suite

tobymao commented 2 years ago

Relative to this benchmark specifically:

  • it feels like an in-between one (not quite a macro-benchmark but more complex than a micro-benchmark)
  • could it be be made represent a full Python workload more closely (or integrated into such a benchmark)?
  • what workloads would it represent or be a part of?
  • how much coverage of those workloads are already in the pyperformance suite?
  • how should this benchmark be categorized/tagged?

(I'm sure we'll merge it in regardless of the answers.)

in terms of workflows, it represents a good chunk in that people want to parse many sql queries (data engineering / analytics). the normalizer also represents mutation of queries which is another kind of macro workflow. there are some companies that use sqlglot to parse 10s of thousands of sql queries to extract out metadata.

sqlglot has a prototype engine which could represent more macro workflows, but it's not quite ready yet and not something i want to expose at this point.

ericsnowcurrently commented 2 years ago

Thanks for the benchmark!