yacy / yacy_grid_mcp

The YaCy Grid Master Connect Program
GNU Lesser General Public License v2.1
653 stars 28 forks source link

YaCy Grid Component: Master Connect Program (MCP)

Build Status

The YaCy Grid is the second-generation implementation of YaCy, a peer-to-peer search engine. A YaCy Grid installation consists of a set of micro-services which communicate with each other using a common infrastructure for data persistence. The required storage functions of the YaCy Grid are:

All YaCy components are microservices which can be deployed i.e. using Docker or other application hosting methods which are provided as cloud services from many different providers. A YaCy grid will scale with the number of microservices that connect to a common broker. That broker is the MCP, this application.

The MCP provides http servlets to access the services mentioned above: assets, messages, databases. If a YaCy Grid microservice wants to connect to this infrastructure, it connects to the MCP to use them. The API to access MCP servlets are integrated into the MCP as well, every YaCy Grid service must integrate the whole MCP as infrastructure to get access to the API. This makes it possible that the MCP acts also as router to the infrastructure: if a YaCy Grid service has used the MCP once, it learns from the MCP to connect to the infrastucture itself. For example:

The asset storage, message distribution and database services provided by the MCP are handled in an abstract way, that means there is no need to know which system actually performs the operation. However, the following external services are supposed to be supported by the MCP:

For a full enterprise-integration installation, the operator has to start and support a ftp server, a rabbitmq and an elasticsearch. However, all of them can be omitted and the MCP will provide all of these services with built-in functions. For database operations and message storage a MapDB is used, asset files are stored in a local data directory.

Port numbers in the YaCy Grid:

Every YaCy Grid service has a default port. However every service can be started several times. The service is able to detect that another service is running on the same port and will choose an alternative port. That means we will not have any port number collisions any more, just more microservices for one grid.

The default port number of the MCP is 8100

Other port numbers will be:

Communication

Please join our forums at https://searchlab.eu

How do I install the yacy_grid_mcp: Download, Build, Run

At this time, yacy_grid_mcp is not provided in compiled form, you easily build it yourself. It's not difficult and done in one minute! The source code is hosted at https://github.com/yacy/yacy_grid_mcp, you can download it and run loklak with:

> git clone https://github.com/yacy/yacy_grid_mcp.git
> cd yacy_grid_mcp
> gradle run

Requirements for development

Maven and Gradle tools should be installed. To refresh gradle settings in eclipse, right click on project -> configure -> Add Gradle Nature

How to install the infrastructure services (message server, ftp server)?

install and start apache ftp server

ftpserver.user.yacygrid.userpassword=<here is the md5sum>
ftpserver.user.yacygrid.homedirectory=./res/home
ftpserver.user.yacygrid.enableflag=true
ftpserver.user.yacygrid.writepermission=true
ftpserver.user.yacygrid.maxloginnumber=2000
ftpserver.user.yacygrid.maxloginperip=2000
ftpserver.user.yacygrid.idletime=30000
ftpserver.user.yacygrid.uploadrate=0
ftpserver.user.yacygrid.downloadrate=0

This will run the ftp server at port 2121. To test the connection use a standard ftp client and start it with

> ftp -P 2121 anonymous@127.0.0.1

install and start rabbitmq:

[{rabbit, [{loopback_users,[]}]}].

Configuration of the MCP

The mcp will create a subdirectory data/mcp-8100. There, within a conf sub-path you can place a file config.properties which has the same structure as the file in <application-home>/conf/config.properties. Just copy that file to data/mcp-8100/conf/config.properties and replace the default values with your own.

You should set the ftp address and the broker address using url-encoded user-name/pw-settings, like:

grid.ftp.address = <user>:<pw>@<ftp-host>:2121
grid.broker.address = <user>:<pw>@<broker-host>:5672

How to use the API

To test the api, try the following example:

Writing messages

Call:

curl "127.0.0.1:8100/yacy/grid/mcp/messages/send.json?serviceName=testService&queueName=testQueue&message=hello_world"

This will send a message "hello_world" to the queue 'test' of service 'test'. You can ask for the number of entries in the queue with

curl "http://127.0.0.1:8100/yacy/grid/mcp/messages/available.json?serviceName=testService&queueName=testQueue"

To get an entry from such a queue, call:

curl "http://127.0.0.1:8100/yacy/grid/mcp/messages/receive.json?serviceName=testService&queueName=testQueue"

If you did not run the RabbitMQ message server, the messages are written to

data/mcp-8100/messages/testService

If you started a RabbitMQ message server, please monitor the writing of the messages at the web page http://127.0.0.1:15672

Writing assets

Call:

curl "http://127.0.0.1:8100/yacy/grid/mcp/assets/store.json?path=/xx/test.txt&asset=hello_world"

This will write an asset to the path xx/test.txt with the content "hello_world".

curl "http://127.0.0.1:8100/yacy/grid/mcp/assets/load?path=/xx/test.txt"

will load the asset again.

If you started a ftp server, the file(s) will be written relatively to the root path of the ftp home path. If you did not start a ftp server, you can find the file in data/mcp-8100/assets/xx/test.txt

Using a second MCP to use the primary MCP

The MCP organises the connection to the remote RabbitMQ server and a ftp server, but if another Grid Service is started, the MCP tells that service to handle the connection to the RabbitMQ and ftp server itself. The MCP can act as such an external service: just start another MCP and it will run at port 8101. Repeat the curl commands as given in the example above, but now use port 8101 to access the MCP. You will see that the second MCP learns from the first MCP to handle the connection by itself.

How do I install yacy_grid_mcp with Docker?

To install yacy_grid_mcp with Docker please refer to the yacy Docker installation readme.

How do I deploy yacy_grid_mcp on Cloud Providers?

To install yacy_grid_mcp on Cloud Providers please look documentations at yacy_grid_mcp Cloud installation readme.

Useful debugging URLs

RabbitMQ

Elasticsearch

MCP

Crawler

Loader

Parser

Contribute

This is a community project and your contribution is welcome!

  1. Check for open issues or open a fresh one to start a discussion around a feature idea or a bug.
  2. Fork the repository on GitHub to start making your changes (branch off of the master branch).
  3. Write a test that shows the bug was fixed or the feature works as expected.
  4. Send a pull request and bug us on Gitter until it gets merged and published. :)

What is the software license?

LGPL 2.1 (C) by Michael Peter Christen

Have fun! @0rb1t3r