Closed arpadlukacs closed 6 years ago
I seem to getting closer. I installed the pdftohtml package on ubuntu and now the engine does not segfault and even the search works. This package is not listed as a dependency in the readme. I close this issue for now.
When I try to add an URL at /admin/addurl it stops with a core dump. When I add the same URL on the basic page it just turns the area to orange (does that mean something?), but the crawling does not seem to start. The status page says initializing "Crawl Status Msg: | Job is initializing."
On the Spider Queue page I see this: "The Currently Spidering on This Host (0 spiders)" and that does not seem good to me. I would assume if I add an URL a spider should start to crawl that URL. It also says "URLs Ready to Spider for collection main (0 ips in doleiptable)" Maybe that is the problem?
Hm, actually it has the following too, which seems to mean something is happening:
IPs Waiting for Selection Scan for collection main (current time = 1517389356225) (totalcount=1) (waittablecount=1) (spiderdb scanning ip 110.49.110.124)
Sorry if my questions are considered trivial. I'm totally new to this search engine and it seems to have a steep learning curve in order to make it work. I guess that is because of how powerful is.
Update: I had to reinstall everything and this time I used the code from the stable branch instead of master. The good news is that now I saw the spider starts when I added the url on the basic page, but then I get a segfault later on. When I try to add the url using /admin/addurl throws a segfault too.
I changed the merge folder to a non tmp folder as suggested. I tried to enter the cbnc.com url on the basic page then the spiders start and after a few I got the segfault.
So it seems that problem exists in both version or maybe I do something wrong.
Either, I appreciate your feedback and help. Thank you.