Closed westonpace closed 3 years ago
/document
Thanks for looking at this + putting this all together. I made a few changes to the structure (and then a bunch of smaller changes to the arrowbench package that I uncovered as I worked on this, which should have been done already, but no time like the present!)
The major things I did was:
known_datasets
object. They are just like the other dataset that we have (other than the fact that we don't plan on downloading/cacheing it) so I thought they fit better there. The ensure_dataset()
almost worked without needing to make it remote-specific (and I made the one change we needed to support these list-all-the-file datasets which AFAIK weren't actually supported in R when we wrote {arrowbench}) results_sentinel_end
due to a silly R warning about needing to auto close a file that's not closed — I need to look into that more and file a Jira about it if it persists, but I was seeing it on my system. In practice it's not a big deal, but it makes arrow slightly noisier than it should.Thanks for cleaning things up @jonkeane . I looked over your changes and it all looks good to me. Simpler and cleaner and it still tests the same thing :)
Still probably some testing to do here. The sample datasets work ok. I tried collecting the entire dataset (and not specifying any filter/query) but ran into some issues so I want to debug that a bit further.