Open bitfinity opened 10 years ago
Hi,
A good tutorial is definitely missing. I have a complete example (an IPython notebook) in works, but haven't finished it yet.
But I'd love to hear more feedback about the existing tutorial: http://webstruct.readthedocs.org/en/latest/tutorial.html. It is a bit outdated (it is easier to use CRFsuite instead of Wapiti), but it lists all required steps in order. You load trees, then convert them to HTML tokens, then extract features, then feed them into a sequence labelling toolkit, then train the model, and then use it to extract entities from a webpage - there is a chapter for each step in the tutorial. But as it turned out this is not clear at all.
Could you please share your experience with this tutorial? What is not clear? How can I improve it? Don't try too hard - if something is unclear please ask here, this will help making the tutorial better.
There is a couple of "shortcuts" available:
Your tool looks like what I'm looking for, but the documentation is so limited, I can't use it. Just one screencast or example would do the trick. All I want to know is how to train something to use with NER. You suggest using WebAnnotator, and you provide code to load trees out of the files saved from WebAnnotator, but you stop there. Why not follow through with a complete example that shows how to extract the content based on that model? Thanks, -jim