Open Synaps3 opened 7 years ago
Graham - are you already working on this function? It will probably be easier to use content_transformer() in conjunction with gsub() to modify the corpus prior to stemming rather than transform the tdm (by the time it's tdm'd, phrases like "Bacillus Calmette–Guérin" would already be lost).
Hey Adam,
Yeah I think I send you the partial code, but even if I didn't earlier, I think changing it in the corpus may be a lot easier too. There are not straightforward functions for accessing the interior of a TDM.
I was hoping to avoid regenerating the TDM itself, but that may not be too bad of a price to pay. I'll look into doing it as part of the corpus if you haven't already.
Best, Graham
On Wed, Jun 28, 2017 at 12:00 PM, adamlhayes notifications@github.com wrote:
Graham - are you already working on this function? It will probably be easier to use content_transformer() in conjunction with gsub() to modify the corpus prior to stemming rather than the tdm (we would lose phrases with the tdm).
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ryscott5/eparTextTools/issues/1#issuecomment-311755609, or mute the thread https://github.com/notifications/unsubscribe-auth/ACOtepo1JojSVwkURt-HZudfoB_UdEt-ks5sIqKwgaJpZM4NbCQJ .
Should input a Term Document Matrix, and a list of lists (or some objects) which give a word to be the replacement followed by all the terms to be replaced.
For example we might use [["Tuburculosis", "TB", "the disease"]["BCG", "Bacillus Calmette–Guérin", "Guerin"]] to find any copies of "TB" or "the disease" and replace them with "Tuburculosis" and likewise replace mentions of Bacillus Calmette–Guérin with the acronym "BCG"
The output is a new temporary TDM with the terms replaced as if all mentions of TB had been "Tuberculosis" originally.