Open wdkrnls opened 4 years ago
Thanks!
I didn't know txr. It would be nice to be able to use it as is but I didn't find any interface in R.
You say unglue allows you not to leave R, when you did have to leave R, was it to use txr ?
A link for future ref : https://www.nongnu.org/txr/txr-pattern-language.html
Your proposed syntax can't work as is because it should match the exact string "known_color" here. Also as I believe you allude to, it works on top of regular expressions so there needs to be a spot to mention this regex.
Given the function should return a boolean we could use the /
character to mean "if" like in probability theory. So we'd have:
unglue_data(input, "The {color/known_color} {object}.")
Or with explicit regex :
unglue_data(input, "The {color/known_color=.*?} {object}.")
Would it answer your needs? Do you think it's intuitive?
Note: I can't do :
unglue_data(input, "The {color=.*?/known_color} {object}.")
Because it doesn't unambiguously tell me the regex isn't the full ".*?/known_color"
Note that this example can be solved with :
unglue_data(input, "The {color=(green)|(red)|(blue)|(grey)} {object}.")
Or if we want to define it separately :
known_color_pattern <- "(green)|(red)|(blue)|(grey)"
unglue_data(input, sprintf("The {color=%s} {object}.", known_color_pattern))
Can you think of a use case where the above wouldn't be satisfying? I prefer not to complexity unglue if the added value is not clear.
I gave a poor example. Enumerating known cases is pretty convenient to do in R as you have shown. However, the TXR pattern function approach is way more powerful when you cannot enumerate the options and they cannot be described by a regular expression. I really liked your conditional syntax for boolean functions with /
. That would be getting far closer to the power of the TXR approach.
Great package! This saves me from having to leave R for many tasks. I'm curious if you think it would be reasonable to support pattern functions similar to those provided by the TXR pattern munging language https://www.nongnu.org/txr/? This would be in addition to regular expressions. For example, I might want to ensure a patterns could be matched against a vector of options.
Imagine I have the data:
The green lawn. The red chair. The blue box. The grey cat.
I define the pattern function:
Then I can extract like:
Thanks for your consideration and great work!