Open AKAMEDIASYSTEM opened 9 years ago
Hi AKA! I've visited your site once or twice before. Very cool work!
As for the bug you bring up, it will most definitely be addressed in the next major fix. I'm talking new algo, using better features, and a more intuitive explanation as to what goes on!
In any case. I think I'm making a false assumption that this xpath selection will exclude the js later on, in which case the js script (the one you've shown) is a child/descendant of the final "content" node, and gets thrown in with the rest of the extracted text :|
Thanks for bringing this up.
Really enjoying eatiht, it's almost perfect for me.
I'm using it to process urls and send the text to a topic modeler. The output is admirably clean except for in-line code (which seems like it might be easy to detect?).
Here's a sample output that includes well-extracted text and then code:
...I think I could get rid of the scripting/coding parts with some hacking, but wanted to bring this issue up here in case it was helpful to know, or in case I'm missing an obvious solution ;-)
Thanks for any help you can provide, and thanks more for making this awesome repo!
AKA