Closed paleolimbot closed 1 year ago
datalogistik seems like an excellent choice for complex scenarios...here I think that implementing the Python + reticulate + virtualenv + test-skipping-if-one-of-those-doesn't-work dance is excessive when all we need is a few .csvs to get started. If our tests become any more complex than "make sure this compiles", that may be a good time to revisit the need for a Python dependency here. I'm also happy to review a PR if you or somebody more familiar with the tool would like to implement it here.
As far as I can tell, the license on the site applies to the software. I think DuckDB rewrote dbgen to avoid the license issue (which says you're not allowed to modify the software), which I believe makes this redistribution more tied to DuckDB's license? I added a README with all the disclaimers I can think of.
Let's revisit this when I have time to investigate the alternatives properly!
I opted to put them as .csv files in inst/; however, another approach would be to
use_package_data()
such that the currenttpch_tables()
would reduce totpch0001
andread_tpch_df("customer")
would reduce totpch0001$customer
.The idea is that you can do stuff like the following to write tests for these:
Created on 2022-11-28 with reprex v2.0.2