Open moodymudskipper opened 4 years ago
I needed this so I came up with a hack where I prefix the pattern variables with an pattern id so I can gather, separate and spread to get the pattern values as well.
Using the example on the readme it looks like this:
> facts <- c("Antarctica is the largest desert in the world!",
+ "The largest country in Europe is Russia!",
+ "The smallest country in Europe is Vatican!",
+ "Disneyland is the most visited place in Europe! Disneyland is in Paris!",
+ "The largest island in the world is Green Land!")
> facts_df <- data.frame(id = 1:5, facts)
>
> patterns <- c("The {p1_adjective} {p1_place_type} in {p1_bigger_place} is {p1_place}!",
+ "{p2_place} is the {p2_adjective} {p2_place_type=[^ ]+} in {p2_bigger_place}!{=.*}")
> unglue_data(facts, patterns) %>%
+ add_column(facts, .before=1) %>%
+ gather(key="variable", value="value", -facts) %>%
+ filter(!is.na(value)) %>%
+ separate(variable, sep="_", into=c("pattern", "variable"), extra="merge") %>%
+ spread(key=variable, value=value)
facts
1 Antarctica is the largest desert in the world!
2 Disneyland is the most visited place in Europe! Disneyland is in Paris!
3 The largest country in Europe is Russia!
4 The largest island in the world is Green Land!
5 The smallest country in Europe is Vatican!
pattern adjective bigger_place place place_type
1 p2 largest the world Antarctica desert
2 p2 most visited Europe Disneyland place
3 p1 largest Europe Russia country
4 p1 largest the world Green Land island
5 p1 smallest Europe Vatican country
>
There might be gotchas that apply to more general cases that I haven't thought of, but I thought you might find this useful.
So far we don't.