ivan-zapreev / Distributed-Translation-Infrastructure

The distributed statistical machine translation infrastructure consisting of load balancing, text pre/post-processing and translation services. Written in C++ 11 and utilises multicore CPUs by employing multi-threading, allows for secure SSL/TLS communications.
GNU General Public License v2.0
12 stars 3 forks source link

JSON msg parsing error #21

Closed NDewdney closed 4 years ago

NDewdney commented 4 years ago

Building and running demo on Amazon AWS with AMI linux VM, Running bpbd-client to send test file to demo system I get: USAGE: The requested debug level is: 'INFO3', the maximum build level is 'INFO3' the set level is 'INFO3' USAGE: Loading the server configuration option from: ../demo/tls/configs/client-no-tls.cfg INFO: Translation client parameters: { source file = /home/ec2-user/test.zh, source language = German, target file = /home/ec2-user/test.en, target language = english, pre-processor server = WebSocket client parameters: {server_uri = ws://localhost:9004, is_tls_client = false}, translation server = WebSocket client parameters: {server_uri = ws://localhost:9007, is_tls_client = false}, post-processor server = WebSocket client parameters: {server_uri = ws://localhost:9004, is_tls_client = false}, min sentences per request = 100, max sentences per request = 200, request priority = 0, translation info = OFF } INFO3: Sanity checks are: OFF ! USAGE: Using the file reader! USAGE: Reading source text from '/home/ec2-user/test.zh' USAGE: Starting the pre-processor process ... INFO1: Attempting to connect to the server! INFO: Starting creating and sending out pre-processor jobs! USAGE: Waiting for the pre-processor process to finish ... INFO1: The processor job request chunk 1/1 is sent. INFO: Sent out 1 pre-processor jobs, waiting for results. ERROR <incoming_msg.hpp::de_serialize(...):79>: An exception when parsing JSON string: incoming_msg.hpp::de_serialize(...):77: JSON parse error: 3 ERROR <incoming_msg.hpp::de_serialize(...):80>: Received JSON message data: _websocket_server.hpp::pre_processrequest(...):360: This functionality is not supported! ERROR <websocket_client_base.hpp::on_message(...):273>: incoming_msg.hpp::de_serialize(...):81: incoming_msg.hpp::de_serialize(...):77: JSON parse error: 3

Compilation executed with no issue detected. Seems to only affect bbd-processor (balancer and server run ok)

I've not had this problem on previous builds so I am guessing there is some dependency change

ivan-zapreev commented 4 years ago

Well, there has been not changes in the code and demo configurations for a very long time now.

From the provided information it looks like the pre-processor request is being sent to a non pre-processor server. So it is likely you've started a wrong server at the address provided to the client as the pre-processor server address.

Note that the pre/post processor instance can currently not be placed behind a load balancer. This is technically possible but was not implemented.

NDewdney commented 4 years ago

Hi Ivan,

Indeed it seems odd - but I am only running the demo - no changes in configs! First time I have ever had issues with the code - I've been running successfully for years now. Only thing I could do with balancers discovering new servers (or new servers registering with balancers. Currently we have to do balancer re-starts.

Going to try fresh build on a different linux distro. Will let you know.

Trust you are keeping well Ivan.

Nigel

On Wed, 20 May 2020 at 12:21, Dr. Ivan S. Zapreev notifications@github.com wrote:

Well, there has been not changes in the code and demo configurations for a very long time now. From the provided information it looks like the pre-processor request is being sent to a non pre-processor server. So it is likely you've started a wrong server at the address provided to the client as the pre-processor server address.

Note that the pre/post processor instance can currently not be placed behind a load balancer. This is technically possible but was not implemented.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/issues/21#issuecomment-631412989, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELW3UJATI5M6ZS4W2LAQU3RSO4LHANCNFSM4NF2P6XQ .

ivan-zapreev commented 4 years ago

Hallo Nigel,

I am doing pretty well, thank you! Miss the times I could work on the Distributed translation infrastructure at the university of Amsterdam.

How are things with you?

Regarding the issue. Would it be possible that you occasionally had already a translation server or balancer running on the same machine on that port? That would lead to this clash as the processor would not then start on the preconfigured port...

Regarding the pre-configured servers for the load-balancer it is a very interesting idea and a handy feature. Could you describe it in more details? The initial configuration is read from the config file, so if you get a new one added do you then want it to be saved back in the config file or to just keep it in the runtime? Also, would you be ok with it added from the command line of the server with some sort of an add/remove command?

King regards,

Dr. Ivan S. Zapreev

On 20 May 2020, at 18:15, NDewdney notifications@github.com wrote:

 Hi Ivan,

Indeed it seems odd - but I am only running the demo - no changes in configs! First time I have ever had issues with the code - I've been running successfully for years now. Only thing I could do with balancers discovering new servers (or new servers registering with balancers. Currently we have to do balancer re-starts.

Going to try fresh build on a different linux distro. Will let you know.

Trust you are keeping well Ivan.

Nigel

On Wed, 20 May 2020 at 12:21, Dr. Ivan S. Zapreev notifications@github.com wrote:

Well, there has been not changes in the code and demo configurations for a very long time now. From the provided information it looks like the pre-processor request is being sent to a non pre-processor server. So it is likely you've started a wrong server at the address provided to the client as the pre-processor server address.

Note that the pre/post processor instance can currently not be placed behind a load balancer. This is technically possible but was not implemented.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/issues/21#issuecomment-631412989, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELW3UJATI5M6ZS4W2LAQU3RSO4LHANCNFSM4NF2P6XQ .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ivan-zapreev commented 4 years ago

Just saw that the balancer related improvement suggestion is filed as issue #8 ... I completely forgot I was planning to look into that two years ago ... sorry for that! Actually I have a 4 day long weekend ahead so I will try to look into this issue again and see if I can get time to make something with this until the next week.

NDewdney commented 4 years ago

Hi Ivan,

Well, I've just done a fresh install and build from scratch. Used Amazon AMI Linux2. Installed openssl-devel, "Development Tools" (to pick up gcc suite), and cmake. Cloned Distributed-Translation-Infrastructure and built the code as per instructions. No errors, everything fine as per usual.

Went into demo/notls and started system with start.sh. All six screen sessions start and run.

I created a short test file and submitted using bbd-client thus:

build/bpbd-client -d info1 -c demo/tls/configs/client-no-tls.cfg -i Chinese -I ~/test.zh -o English -O ~/test.en

As far as I can see the client config is consistent with the server configs.

The output I get is:

USAGE: The requested debug level is: 'INFO1', the maximum build level is 'INFO3' the set level is 'INFO1'

USAGE: Loading the server configuration option from: demo/tls/configs/client-no-tls.cfg

INFO: Translation client parameters: { source file = /home/ec2-user/test.zh, source language = Chinese, target file = /home/ec2-user/test.en, target language = English, pre-processor server = WebSocket client parameters: {server_uri = ws://localhost:9004, is_tls_client = false}, translation server = WebSocket client parameters: {server_uri = ws://localhost:9002, is_tls_client = false}, post-processor server = WebSocket client parameters: {server_uri = ws://localhost:9004, is_tls_client = false}, min sentences per request = 100, max sentences per request = 200, request priority = 0, translation info = OFF }

USAGE: Using the file reader!

USAGE: Reading source text from '/home/ec2-user/test.zh'

USAGE: Starting the pre-processor process ...

INFO1: Attempting to connect to the server!

INFO: Starting creating and sending out pre-processor jobs!

USAGE: Waiting for the pre-processor process to finish ...

INFO1: The processor job request chunk 1/1 is sent.

INFO: Sent out 1 pre-processor jobs, waiting for results.

ERROR <incoming_msg.hpp::de_serialize(...):79>: An exception when parsing JSON string: incoming_msg.hpp:77: JSON parse error: 3

ERROR <incoming_msg.hpp::de_serialize(...):80>: Received JSON message data: _websocket_server.hpp::pre_processrequest(...):360: This functionality is not supported!

ERROR <websocket_client_base.hpp::on_message(...):273>: incoming_msg.hpp:81: incoming_msg.hpp:77: JSON parse error: 3

and the client hangs (presumably waiting for completion)

So no binary clashes (clean VM) - hence my thinking there is some change in a dependency somewhere. GCC/G++ version used is 7.3.1 CMAKE version 2.8.12.2 openssl versions OpenSSL 1.0.2k-fips

Linux VM base image: amzn2-ami-hvm-2.0.20200406.0-x86_64-gp2 (ami-01a6e31ac994bbc09)

As far as the 'hot' configuration changes idea goes, I was thinking that one could go one of two ways (or even both). On starting a server, could it notify/register a specified server that it is now ready? I'm guessing the JSON based msg protocol could be simply extended for this purpose?... The point(s) to send notification to could be specified in start-up command line or in the associated config file. Balancers could then propagate msg to other balancers (much like DNS process I guess). The other way is for balancers to query some form of register at regular intervals. Downside of the second is the introduction of another bit of 'infrastructure'. However, the balancer doing a regular polling is possibly worth introducing. One thing we've noticed is that if a server goes down, balancers will still report they can access those servers. A regular 'health-check' would be useful so that we get an indication if a translation service is unavailable, so not to try that service and do an auto reboot (which coupled with hot configuration would give us potential for a degree of system self-maintenance).

As for writing back out to config files - I guess the 'register' option would give the current. I think modifying initial configs is probably a bad idea, but an output of current (running) config would potentially be useful. Again I can see tow useful modes, one a run-time query through extending the JSON msg protocol and/or two a runtime server command to dump config to a file. I guess the former is the most flexible as it divorces what is done with the configuration from the system running it.

Does that make sense? In part I guess it is a question of what is least effort / most easily achievable! It's a "would be nice to have" rather than show stopper.

Best regards, Nigel

On Wed, 20 May 2020 at 18:28, Dr. Ivan S. Zapreev notifications@github.com wrote:

Just saw that the balancer related improvement suggestion is filed as issue #8 https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/issues/8 ... I completely forgot I was planning to look into that two years ago ... sorry for that!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/issues/21#issuecomment-631616779, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELW3UIW366LSXLKME5VL4LRSQHMNANCNFSM4NF2P6XQ .

NDewdney commented 4 years ago

Hi Ivan, Red Herring, my error, but... The client cfg file demo/tls/configs/client-no-tls.cfg is at fault. Pre and Post server specs are set to port 9004 while they are set to 9005 for the demo set up. (Should have spotted that earlier - sorry). Having fixed that, using demo set up from bpbd-client works fine. Which means my root problem is actually my set-up of a WebSocket based client - which indeed turns out to be the case. So original issue sorted. But while playing, I accidentally found that sending a JSON msg with ch_idx >= num_chs causes a segmentation fault in the processor. That probably ought to be fixed!

ivan-zapreev commented 4 years ago

Dear Nigel,

Regarding the demo set-ups: The "tls" and "non-tls" demos are two different demos and you may not use the "non-tls client config" from "tls" demo with the "non-tls" demo. As explained in the corresponding README.md files, the "tls" demo has the following deployment:

Whereas the deployment for the "non-tls" demo is different:

So once again, you may not use configuration files from https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/blob/master/demo/tls/configs/ https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/blob/master/demo/tls/configs/client-no-tls.cfg in conjunction with the "non-tls" demo as this is a completely different demo.

Kind regards,

Ivan

On Tue, May 26, 2020 at 10:09 AM Ivan Zapreev ivan.zapreev@gmail.com wrote:

Dear Nigel,

Just for clarity, do you mean to say that

https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/blob/master/demo/tls/configs/client-no-tls.cfg

should have the Pre-processor port to be set to 9005?

Regarding: "I accidentally found that sending a JSON msg with ch_idx >= num_chs causes a segmentation fault in the processor. That probably ought to be fixed!" I can only say that if there is a third party client that sends a malformed message then indeed there are no checks build in and this can cause a crash. By the time when the tool was developed Christof did not want me to spent time on making the tool robust, but rather expect proper inputs. However, this is indeed something that can be improved. Please feel free to submit a separate issue for that.

Kind regards,

Ivan

On Fri, May 22, 2020 at 1:12 PM NDewdney notifications@github.com wrote:

Hi Ivan, Red Herring, my error, but... The client cfg file demo/tls/configs/client-no-tls.cfg is at fault. Pre and Post server specs are set to port 9004 while they are set to 9005 for the demo set up. (Should have spotted that earlier - sorry). Having fixed that, using demo set up from bpbd-client works fine. Which means my root problem is actually my set-up of a WebSocket based client - which indeed turns out to be the case. So original issue sorted. But while playing, I accidentally found that sending a JSON msg with ch_idx >= num_chs causes a segmentation fault in the processor. That probably ought to be fixed!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/issues/21#issuecomment-632637092, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ76HNSPEKSEH24NGEZRFDRSZMZZANCNFSM4NF2P6XQ .

ivan-zapreev commented 4 years ago

while playing, I accidentally found that sending a JSON msg with ch_idx >= num_chs causes a segmentation fault in the processor. That probably ought to be fixed!

Regarding this, index issue. Please take a look at

https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure/blob/master/inc/processor/processor_job.hpp

and the processor request handling function:

                   inline void add_request(proc_req_in * req) {
                        unique_guard guard(m_req_tasks_lock);

                        //Get the task index
                        uint64_t chunk_idx = req->get_chunk_idx();

                        //Assert sanity
                        ASSERT_SANITY_THROW((chunk_idx >= m_exp_num_chunks),
                                string("Improper chunk index: ") + to_string(chunk_idx) +
                                string(", must be <= ") + to_string(m_exp_num_chunks));

                        //Assert sanity
                        ASSERT_SANITY_THROW((m_req_tasks[chunk_idx] != NULL),
                                string("The chunk index ") + to_string(chunk_idx) +
                                string(" of the job request ") + m_job_token +
                                string(" from session ") + to_string(m_session_id) +
                                string(" is already set!"));

                        LOG_DEBUG << "Storing the job: " << req->get_job_token()
                                << " chunk: " << chunk_idx << END_LOG;

                        //Store the task
                        m_req_tasks[chunk_idx] = req;

                        //Increment the number of tasks
                        ++m_act_num_chunks;
                    }

As one can see there are sanity checks for the chunk indexes to be within the limits and more. However, these are disabled in non SANITY mode as such checks influence the performance. So for testing purposes I would suggest building and running the infrastructure in the SANITY mode, the information on that can be found in:

https://github.com/ivan-zapreev/Distributed-Translation-Infrastructure#project-compile-time-parameters

You may also run a server build with this mode in production but I have not estimated the performance drops for this case.