Closed mazing closed 9 years ago
Sorry, I should put a note up. I’m in the process of updating retext, and all its plugins. Could you try installing with @next
? npm install retext@next
should do the trick?!
I havent got the time to update the docs yet, but you should be able to run the example as follows:
/* Require dependencies. */
var Retext = require('retext');
var emoji = require('retext-emoji');
var smartypants = require('retext-smartypants');
/* Create an instance using retext-emoji and -smartypants. */
var retext = new Retext()
.use(emoji, {
'convert' : 'encode'
})
.use(smartypants);
/* Read a document. */
retext.process(
'The three wise monkeys [. . .] sometimes called the ' +
'three mystic apes--are a pictorial maxim. Together ' +
'they embody the proverbial principle to ("see no evil, ' +
'hear no evil, speak no evil"). The three monkeys are ' +
'Mizaru (:see_no_evil:), covering his eyes, who sees no ' +
'evil; Kikazaru (:hear_no_evil:), covering his ears, ' +
'who hears no evil; and Iwazaru (:speak_no_evil:), ' +
'covering his mouth, who speaks no evil.',
function (err, file, doc) {
/* Handle errors. */
if (err) {
throw err;
}
/* Log the text content of the tree (the transformed input). */
console.log(doc);
/**
* This logs the following:
* The three wise monkeys […] sometimes called the three
* mystic apes—are a pictorial maxim. Together they
* embody the proverbial principle to (“see no evil,
* hear no evil, speak no evil”). The three monkeys are
* Mizaru (🙈), covering his eyes, who sees no evil;
* Kikazaru (🙉), covering his ears, who hears no evil;
* and Iwazaru (🙊), covering his mouth, who speaks no evil.
*/
}
);
Oh okay. I'm just about to learn nodejs in order to use this amazing library :-)
Your example works!. Thank you!
Wow, i’m honoured :) Great to hear it’s working. I’ll let this issue open ’till i update the readme.
Do you also have any plans for implementing https://en.wikipedia.org/wiki/Tf–idf?
Never had the need, but i don’t doubt if someone could implement it. I’m actually not that familiar, but I could probably help you if you’d wanted to!
I will take a look at it when I become better at programming :-)
I guess it's something like
// build corpus
var document1 = "This text is about node.";
var document2 = "This text is about ruby.";
var document3 = "This text is about nothing."
var corpus = [document1, document2, document3];
// some long text
var document = "This text is about node and ruby.";
// stopwords to filter out
var stopwords = ['it', 'is', 'am'];
// split into terms
var terms = document.split(/\W+/).filter(function(token) {
// lowercase words (ignore case)
token = token.toLowerCase();
// filter out short words and stopwords
return token.length >= 2 && stopwords.indexOf(token) === -1;
});
// loop through remaining terms
for (var i=0; i<terms.length; i++) {
var term = terms[i];
// compute tfidf for term
var tfidf = tfidf(term, document, corpus);
// print
console.log(term + '(tfidf: ' + tfidf + ')');
}
// function to compute tf-idf
function tfidf(term, document, corpus) {
// compute term frequency (occurances in text)
var num_occurances_in_document;
var tf = num_occurances_in_document;
// compute inverse document frequency as idf=log(N/df_t)
var num_documents_in_corpus;
var num_documents_containing_term;
var idf = Math.log(num_documents_in_corpus / num_documents_containing_term);
// return tf-idf
return tf * idf;
}
but I guess many of the needed functions are already implemented in retext
.
Yup, retext has real good support for finding “words” :)
Closed by 08a095d9748e6d3b06aad82f32752ca6d39b3aef.
If I use the example at https://github.com/wooorm/retext#usage, I get the error
I have installed
retext
,retext-emoji
andretext-smartypants
and I'm using node v4.0.0.