moodymudskipper / unglue

Extract matched substrings using a pattern, similar to what package glue does in reverse
GNU General Public License v3.0
159 stars 2 forks source link

keep trace of pattern used #26

Open moodymudskipper opened 4 years ago

moodymudskipper commented 4 years ago

So far we don't.

mathematiguy commented 2 years ago

I needed this so I came up with a hack where I prefix the pattern variables with an pattern id so I can gather, separate and spread to get the pattern values as well.

Using the example on the readme it looks like this:

> facts <- c("Antarctica is the largest desert in the world!",
+            "The largest country in Europe is Russia!",
+            "The smallest country in Europe is Vatican!",
+            "Disneyland is the most visited place in Europe! Disneyland is in Paris!",
+            "The largest island in the world is Green Land!")
> facts_df <- data.frame(id = 1:5, facts)
> 
> patterns <- c("The {p1_adjective} {p1_place_type} in {p1_bigger_place} is {p1_place}!",
+               "{p2_place} is the {p2_adjective} {p2_place_type=[^ ]+} in {p2_bigger_place}!{=.*}")
> unglue_data(facts, patterns) %>%
+     add_column(facts, .before=1) %>%
+     gather(key="variable", value="value", -facts) %>%
+     filter(!is.na(value)) %>%
+     separate(variable, sep="_", into=c("pattern", "variable"), extra="merge") %>%
+     spread(key=variable, value=value)
                                                                    facts
1                          Antarctica is the largest desert in the world!
2 Disneyland is the most visited place in Europe! Disneyland is in Paris!
3                                The largest country in Europe is Russia!
4                          The largest island in the world is Green Land!
5                              The smallest country in Europe is Vatican!
  pattern    adjective bigger_place      place place_type
1      p2      largest    the world Antarctica     desert
2      p2 most visited       Europe Disneyland      place
3      p1      largest       Europe     Russia    country
4      p1      largest    the world Green Land     island
5      p1     smallest       Europe    Vatican    country
> 

There might be gotchas that apply to more general cases that I haven't thought of, but I thought you might find this useful.