Open robertbastian opened 1 year ago
Context: The reason why our usual load-data-in-constructor pattern doesn't work here is that currently, the internal functions I'm using (load_for_ecma262_unstable
, load_general_category
, ..) accept the DataProvider and not the payload.
We also need to decide how "overescaped" inputs should be handled, i.e., the kinds of inputs that exemplar characters is dealing with.
Thought: I believe allowing users to supply their own iterators would solve this issue with minimal effort on their and our side
An alternative API to the free parse
function:
let builder = UnicodeSetBuilder::new(provider);
let set = builder.parse(source, options)?;
Furthermore, the transliterator would benefit from a function that either takes an Iterator, or returns how much of the source string was actually parsed. That allows the transliterator parser to hand over to the UnicodeSet parser as soon as it reaches a [
, and then continue from where the UnicodeSet parser left off after a successful UnicodeSet parse.
Another option (not necessarily recommending it) would be to do like we do in TimeZoneFormatter, where we have a bunch of Option<DataPayload<_>>
for the different keys we may or may not need, and then a bunch of little load
functions that fill in each key. In our case, the parse function could fail with a list of keys that need to be loaded or something like that. We could also have a parse function that takes &mut self
and fills in the payloads as needed.
That sounds a bit like a caching provider, or is a caching layer too complex if each provider individually is "optional"? (I'm not too sure, but I feel like such an implementation might need a type-based map (eg BidiControlV1Marker
-> Option<DataPayload<BidiControlV1Marker>>
))
I'd characterize it less as "caching" and more as "lazy loading" but you're correct.
If we had a struct LazyProvider<P: DataProvider<M: DataMarker>>
that impls DataProvider<M> for LazyProvider<P>
that does the lazy-loading/caching you describe, we could keep all APIs the same but would avoid multiple calls to the underlying (potentially expensive) DataProvider::load
function.
Do you think that's more expensive than TimeZoneFormatter's explicit approach? I feel like it just boils down to being able to generically replace TimeZoneDataPayloads
Discuss with:
Optional:
pub struct PropertyNameToSetMapper { ... }
impl PropertyNameToSetMapper {
pub fn try_new_for_ecma262(...) -> Self { ... }
pub fn try_new_all(...) -> Self { ... }
pub fn load_for_property(&self, name: &str) -> CodePointSetDataBorrowed { ... }
pub fn load_for_property_with_value(&self, name: &str, value: &str) -> CodePointSetDataBorrowed { ... }
// also strict/loose versions of those functions
}
pub trait PropertyNameToSetMapperLike {
type Error: Debug;
fn load_for_property(&self, name: &str) -> Result<CodePointSetDataBorrowed, Error> { ... }
fn load_for_property_with_value(&self, name: &str, value: &str) -> Result<CodePointSetDataBorrowed, Error> { ... }
// with the strict and loose
}
LGTM: @Manishearth @sffc @robertbastian @skius @eggrobin
Next: free function?
$
anchor. It is either the anchor or the FFFF character.$
in UnicodeSet.&self
and not &mut self
right?&mut self
and the caching is handled by the UnicodeSetParser&self
because the PropertyNameToSetMapper should be a generally useful thing outside of the UnicodeSetParserparse_to_cpilsl
. For now I think parse
is fine. We can add more function overloads as needed in future releases.LGTM: @Manishearth @skius @younies @robertbastian @eggrobin @sffc
Any additional feedback on this, @echeran?
cc @skius
parse
be a free function