As Wiktionary doesn't provide any metadata in the scraped HTML in order to identify definitions, and as we can't derive them heuristically because of multiple different languages, it would be better to gather, agnostically, all paragraphs, even if short, up to a certain number of characters. This needs testing because it can end up gathering lots of irrelevant stuff.
As Wiktionary doesn't provide any metadata in the scraped HTML in order to identify definitions, and as we can't derive them heuristically because of multiple different languages, it would be better to gather, agnostically, all paragraphs, even if short, up to a certain number of characters. This needs testing because it can end up gathering lots of irrelevant stuff.