reynoldsnlp / flair

fork from the FLAIR project at Tuebingen University
Other
2 stars 0 forks source link

arabic results are coming back VERY slowly #19

Open reynoldsnlp opened 5 years ago

reynoldsnlp commented 5 years ago

Searched for الحور الرجراج for 40 results and they were being processed very slowly. Starting about 25/04/2019 16:19:41 in catalina.out.

reynoldsnlp commented 5 years ago

mvn clean install and restarting tomcat fixed it. Still don't know the source of the problem.

We still had lots of free memory.

mjbriggs commented 5 years ago

What I suspect to be the problem is websites that are pdf's. The tika processor doesn't work with them at the moment so a bunch of valid search results are tossed because we can't process them. This causes a significant slowdown since we load and try to process the webpage.

mjbriggs commented 5 years ago

We may have introduced some memory leaks as well

mjbriggs commented 5 years ago

I have removed pdfs from being acceptable search results. I have not seen this issue pop up in a while so I believe that solved it, but I have not closed this issue since I did not know what very slowly meant. Additionally, it is difficult to tell from the front end whether the server is taking a long time or if the server ran out of heap space.