privacore / open-source-search-engine

No longer maintained. Please read our shutdown message.
https://privacore.github.io/
Other
103 stars 14 forks source link

Installing the script #33

Closed siddarth9 closed 7 years ago

siddarth9 commented 7 years ago

I can't able to install the search engine on my web server of ehost.com can anyone tell me how to install and run the search engine

martinvahi commented 7 years ago

I haven't tried the privacore fork, but I have given up on the Gigablast brand of search engines, because for me the YaCy seems to be much more stable, at least one version that I cloned from the YaCy development repository.

If You are still interested in trying out the Gigablast brand of search engines and have a relatively small amount of pages to index and You can tolerate the fact that the search engines crashes about once per week, then You may try out my preconfigured Gigablast virtual appliance.

If You wish, You may also read my blog post, where I describe my conclusions about my experimentation with search engines.

br-privacore commented 7 years ago

It does require some technical knowledge to get Gigablast up and running. We do not provide support though and have to refer you to the original developer.

Martin, we likely share many frustrations, but we have carried on and our fork is much more stable. We do run it in production on 30 servers, but have seen many bugs - and still discover new ones. We have spent countless hours fixing and improving the code and do have a good grip on it these days. Many parts of the original code has been replaced by rewritten parts in our fork.

martinvahi commented 7 years ago

Martin, we likely share many frustrations, but we have carried on and our fork is much more stable. We do run it in production on 30 servers, but have seen many bugs - and still discover new ones. We have spent countless hours fixing and improving the code and do have a good grip on it these days. Many parts of the original code has been replaced by rewritten parts in our fork.

Thank You for Your answer.

I guess that for me the key phrase within Your answer is the "many parts of the original code has been replaced", because when I looked at the upstream version of the Gigablast, then I had the following observations:

Observation 1

Due to the fact that the upstream Gigablast project got started for a long-logn time ago, back in a day, when computers had only one CPU-core and the building of multi-threaded software was niche thing and there were no proper open source database engines available, the original author of the Gigablast, the Matt Wells, had to come by with his own, custom, database implementation and threading support. Obviously, as he had also other things to work on, the spider and other parts of the search engine, he just did not have the capability to put as much effort into those parts as dedicated database engine developers could afford. I believe that nowadays he would probably not implement his own database engine from scratch and would do many things differently than he did at his Gigablast implementation.

That is to say, due to historic reasons the upstream Gigablast code really requires a serious, substantial, overhaul despite the fact that the original author of the Gigablast did a very good job.

I do not know, how You have done it at the current fork, I haven't studied Your code, but one of the first things that I might try to swap out is the mechanism, how search results are saved. I would create a database access abstraction layer, so that the database engine can be swapped out at will, and then "port" the RethinkDB to that layer. May be the upstream implementation already has that abstraction layer, I haven't studied the code. Non the less, if compared to just compiling and running the YaCy, the hacking on the Gigablast is a hell of a huge endeavor.

Secondly, the conversion of documents (PDF, RTF, etc.) is something that might be modularized, if it hasn't been modularized already. The upstream Gigablast version actually seems to handle the PDF's really nicely, but given the history of the Gigablast project, I suspect that the PDF to text conversion might also be some legacy code.

Thirdly, the "ranking" side of any modern search engine is quite difficult. I believe that different people require different ranking and that's a huge Artificial Intelligence and Linguistics related task in its own right, without any of the rest of the upgrading related work.

Observation 2

Even, if the upstream implementation were left as it is, it should be fixed in terms of fixing plain C++ related errors. I believe that in 2017 the original/upstream author of the Gigablast would write a very different C++, but it's a very old project.

Nowadays there are open source formal verification tools for C++. I know that C is not the same as C++, I've done speed optimized C++ for years as a "day job", but I also believe that may be some C++ class methods might be tested by using plain C verification tools combined with "code generation" that copies the code of the C++ method from the C based test code to the C++ source. Some tool candidates: Frama-C, mbeddr, STABS, CPAchecker, SMACK, CBMC.

May be "secretly" one might also try out the "free-for-non-commercial-activity" license based Microsoft VCC

Partial adoption of the rules that the aviation industry uses for creating avionics software, plane control software, might also be useful. The Escher Technologies offers a fine set of formal verification related tutorials to promote their closed source products, a bit like the Wolfram provides an excellent mathematics encyclopedia to promote their own closed source products.

Observation 3

The upstream author seems to have just dumped the source, as he self uses it, to GitHub and spends his own time on developing Gigablast related commercial services, without investing his own time on upgrading the Gigablast core.

Obviously I'm thankful for the code and the very liberal license, but since he does not seem to even reject pull requests, I suspect that his plans are really to see, what happens with the Gigablast code in the wild and then pick something that he self likes the most. So, I believe that probably the privacore branch of the Gigablast is the de facto Gigablast trunk.