Closed un-knight closed 5 years ago
If you want to query for a certain row you can use predicates, or a combination of indexes + predicates. This is not a tool for implementing arbitrary sampling policies, since it is not very efficient. The actual issue is that parquet format is not well suited for random row access, but for chunk/batch processing.
I want to get sample by index, but there seems no way to do this, sine
Reader
get samples only by walking through the row. In this case, I can't even shuffle the samples by index.