Open GoogleCodeExporter opened 9 years ago
The solution for me was to patch some crawler4j classes to include <script>
element handling. Finally, I also had to patch Tika's HtmlHandler, which simply
ignores any <script> tag inside of html <head> :-(. See attached files for
patched classes and search for keyword PATCH.
Original comment by m4rcow...@gmail.com
on 21 Feb 2013 at 8:56
Attachments:
Original issue reported on code.google.com by
alirezan...@gmail.com
on 19 Jan 2013 at 11:50