Closed GoogleCodeExporter closed 9 years ago
Hmmm from the code you sent, I suspect what you are looking for is to use a
Bag-of-words. Is this the case? The Codification filter is more appropriate to
process data tables, and you seem to be wanting to perform text classification.
Currently there is no class to extract a BoW in the current version of the
framework. However, this class was already in the works to be included in the
next release of the framework, and so I happen to have a partial but working
implementation here. I am sending it attached, together with an example. Please
see if it works for you.
A final version will be available in the next version of the framework.
Original comment by cesarso...@gmail.com
on 25 Oct 2012 at 4:39
Attachments:
Awesome. Thanks. A Bag o' Words is exactly what I was trying to do. This worked
great.
Original comment by pburr...@gmail.com
on 26 Oct 2012 at 6:41
In the BagOfWords class, I added some error handling in GetFeatureVector for
words that were not found in the trained data:
/// <summary>
/// Gets the codeword representation of a given text.
/// </summary>
///
/// <param name="image">The text to be processed.</param>
///
/// <returns>An integer vector with the same length as words
/// in the code book.</returns>
///
public int[] GetFeatureVector(params string[] text)
{
int[] features = new int[NumberOfWords];
// Detect all activation centroids
for (int i = 0; i < text.Length; i++)
{
try
{
int j = stringToCode[text[i]];
features[j]++;
if (features[j] > MaximumOccurance)
features[j] = MaximumOccurance;
}
catch (KeyNotFoundException knfe)
{
//eat the KeyNotFoundException. This word simply isn't in the training data.
}
}
return features;
}
Original comment by pburr...@gmail.com
on 26 Oct 2012 at 7:59
Bag-of-Words has been incorporated on Accord.NET 2.8.
Original comment by cesarso...@gmail.com
on 6 Nov 2012 at 5:24
Original issue reported on code.google.com by
pburr...@gmail.com
on 25 Oct 2012 at 2:02Attachments: