Closed thom4parisot closed 8 years ago
Yes, the Root
node cannot be changed: it must always be the same object. You cannot change it by overwriting file.namespace('retext').tree
.
You can however change all children in the Root
in any way you like!
Understood!
Problem is when I change the children, it seems to break the phrase detection as, I guess, the positions have changed?
'use strict';
const visit = require('unist-util-visit');
module.exports = (retext, options) => {
const stopwords = options.stopwords;
return (node, file) => {
visit(node, 'WordNode', node => {
node.children = node.children.filter(d => {
return stopwords.indexOf(d.value) === -1;
});
});
};
};
Does it mean I have to also iterate over the next sibling position and update the WordNode position start/end as well?
Pfew; quite the problem.
First off: no, start
and end
are not used in retext-keywords
, changing those shouldn’t fix/break anything.
Now, retext-keywords
depends on retext-pos
. With your code, the transformers run in order as expected: retext-stopwords
, retext-pos
, and retext-keywords
.
However, I just now noticed you were talking about French. I think there lies the problem, retext-pos
caters especially to English, and only words with certain parts of speech classifications are eligible for inclusion in the results. retext-keywords
does not use stopwords
, just POS tags.
As a consequence, I cannot come up with a solution for this other than a) create a French JavaScript POS tagger (extremely hard), or b) fork retext-keywords
to also support words without POS tags and not occurring in a configurable list of stop-words (and with forking I mean I’ll accept it back into upstream if you’d PR).
I’m currently not in a position to dig in myself, but if you’re interested in working on this I can definitely advise and help out: it’s been a while since I touched the code though!
So if I understand well, best solution would be to implement the stopwords directly into retext-keywords
, correct?
Or shall I mark the stopwords TextNode
s as non-relevant for POS?
Although something I do not understand, is why retext-keywords
(retext-pos
) breaks because I altered the tree.
Hello,
thanks for retext — it is very elegant to use! Although I have managed to create a plugin which filters a tree, it seems the next plugin in the chain does not inherit of the changes of the previous plugin.
Here is my high level code:
retextKeywords
is theretext-keywords
plugin but because it does not catch french stopwords, I ought to modify the tree to remove them. Here is the code ofretextStopwords
:When I check
tree
, it indeed does not contain theTextNode
s I wanted to remove. But these words are still taken in account byretextKeywords
which is next in theuse()
chain.Any tip or hint to perform this?
Thanks a lot :-)