Closed harold closed 8 months ago
[not urgent - I can proceed presently with acceptable performance w/ ds/select-columns
]
Yes - both make sense. the whitelist/blacklist are used and supported specifically only with csv and parquet file loading systems and the make the parsing far cheaper. They aren't currently passed into the new-dataset system.
Or - put more generally - it is up to the specific loading extension to either support them or not.
Thanks! That helps me understand a lot.
I get now that there is more time to be saved w/ an allow list on (for example) csv over json, because a lot of the parsing is already done to make the sequence of maps in json, where csv there is more work to save by skipping entire columns.
I am going to proceed with a pathway something like json -> ds/->dataset -> ds/select-columns -> ds/row-map
for now, because I think w/ all the data I need to process this way this wont be the performance bottleneck. I'll circle back around if it does become a pain/blocker.
Separately, I do think having :column-allowlist
and :column-blocklist
as synonyms for :column-whitelist
and :column-blacklist
is a good idea.
Yes very good. Much clearer and they don't rely on anachronisms.
I've proposed something in charred that is an addition rather than a change to the API that should allow this: https://github.com/cnuernber/charred/pull/24
Closing this - it appears addressed.
Two thoughts:
:column-whitelist
for the first time today:In the second case, I expected to get a dataset with no column
"b"
. Does that make sense?:column-allowlist
and:column-blocklist
as synonyms for:column-whitelist
and:column-blacklist
? Being allowed or blocked doesn't have anything to do with being black or white. :hugs: