h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

The text suggests an outer join, but the code executes a left join. #225

Closed BodonFerenc closed 3 years ago

BodonFerenc commented 3 years ago

https://github.com/h2oai/db-benchmark/blob/c7421051af7530951d16b1505158371aebc0d2c1/pandas/join-pandas.py#L116

jangorecki commented 3 years ago

Left Join is an outer join. I don't see anything wrong in this wording.

BodonFerenc commented 3 years ago

I understand your point. It would make no harm to be explicit and use "outer left join". In some solution, outer join refers to full outer join, e.g. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

jangorecki commented 3 years ago

I think it make sense to improve pandas documentation then. In database modelling, datawarehouses and business intelligence jargon I have never heard of anyone saying outer join when they mean full outer join (people say full join then), but I heard countless times outer join to mean left outer, or sometimes right outer. The point is to make it short, so plot is not overly covered with text. I agree it is small difference but for me this left/right is unnecessary here.