Open binarylogic opened 3 years ago
Blocking this with needs
labels until we can firm up the requirements. I'm expecting @jszwedko to suggest changes 😁 .
:smile:
I think the requirements seem mostly good. A few missing ones that I'm aware of:
/foo=/bar=2
We can make the timestamp layout opinionated by default, but I think it'd be useful to let users configure that. Maybe they'd like the timestamp to go first, for example, or they want to segment by hour.
As far as I know the Array part isn't possible to do at the minute.
By the time the function receives the array all the path names have been evaluated, so we only have access to their values and not the path names. The array may not even have path names. So for example to_hive_partition([to_hive_partition(), sha2("blah"), 34])
would compile fine.
Since the proposed design 1) don't enforce the field order with the current implementation based on BTreeMap
(can be fixed by switching to IndexMap
) and 2)isn't flexible around timestamp we suggest an alternative solution, based on the template syntax we already support, which addresses both issues:
to_hive_partition("env={{environment}}/app_id={{ application_id }}/year=%Y/month=%m/day=%d/")
This syntax should be already familiar to users. On the other side it becomes so flexible that one can question the need of this specific function at all. So alternatively we could create some common function(e.g. template()
) for all possible use cases and add url_safe
parameter to make sure it's url-encoded:
template(("env={{environment}}/app_id={{ application_id }}/year=%Y/month=%m/day=%d/"), url_safe=true, limit=256)
cc @jszwedko
@binarylogic , @FungusHumungus what do you think?
Yes. This is probably not as convenient for the user as the original issue requested, but this is a good way to do it that does't require making any changes to the VRL compiler. Plus these functions can be used in a wide number of other scenarios too.
@vladimir-dd given that this was not as simple as I originally thought, why don't we table this and come back to it when we have firmer requirements.
Creating Hive partition strings is very common when writing to file-like storages (such as
aws_s3
). Unfortunately, creating these partition strings is fraught with foot-guns. To protect users from these issues we should offer a function that makes this task easy.Examples
Given this event for all examples:
Array
And this Remap script:
Would produce this string:
Notice that:
Map
And this Remap script:
Would produce this string:
Notice that the map keys are used as the names
Single value
And this Remap script:
Would produce this string:
Requirements
/
=
/
charactercc @jszwedko since he had the pleasure of creating such partition strings for the benchmarking work.