Closed rkechols closed 4 years ago
When I use my command line utilities to analyze the text from that page, everything completes fine. Do you have a good way to determine what part of the text is causing the hangup?
Does the timeout make the whole search fail, or do we just ignore that page in the results?
It happens repeatedly across almost the whole page.
If the search is run with breakpoints in CgConv.java
we can see what string value is causing the problem.
The document is still analyzed, but the sentences that cause cg-conv to hang do not have the full analysis.
Can you send me a file of exactly the text that is passed in to the analyzer (i.e. what exactly is extracted from the html). I want to see if there are funny characters, or something like that.
We've discovered that on the same problematic input, cg-conv
from VISL CG-3 Disambiguator version 1.3.1.13891
hangs on Windows, but does not hang on Linux.
Making a note in the primary readme
When the
cg-conv
utility is run fromsrc/main/java/com/flair/server/utilities/CgConv.java
, certain inputs cause it to hang, then time out as programmed.One such input for Russian is the content of this site, which can be found by searching
говорить
in Russian with curated domains; it is the 3rd result (as of June 18, 2020)