1. There is currently no mechanism to reduce the size of a processed wiki to
fit onto smaller cards.
2. Wiki dumps that will fit onto smaller cards are out of date.
3. The only documents filtered out are those that are not supported (e.g.
templates, metadata)
There should be a simple and elegant way to trim the size of large dumps by the
indexer. One easy way would be to filter out articles that have a low word
count, or low character count, as these often are "stubs" and do not contain
much useful information. There could be an extra option to include or exclude
redirects, since those are often low character count.
Original issue reported on code.google.com by charles....@gmail.com on 10 Aug 2010 at 2:21
Original issue reported on code.google.com by
charles....@gmail.com
on 10 Aug 2010 at 2:21