ude-soco / CourseMapper-webserver

A collaborative course annotation and analytics platform
https://coursemapper.de
MIT License
1 stars 0 forks source link

Worker and Queue System #640

Open jeanqussa opened 10 months ago

jeanqussa commented 10 months ago

Introduction

This is a new architecture for the knowledge graph part of CourseMapper. It aims to resolve some issues related to the scalability and resilience of the current architecture. In the existing architecture, the client talks to the KG server directly to make requests. This is not ideal.

In this issue I propose that we split the KG backend into two parts:

Progress

jeanqussa commented 9 months ago

@ralf-berger If we want to deploy the new version with new services (redis), do you have to do anything on your part or will everything happen automatically once we push to main?

ralf-berger commented 9 months ago

@jeanqussa That would require me to update the deployment configuration. Can you point me towards a changeset that introduces Redis and the corresponding config of existing services?

jeanqussa commented 9 months ago

@ralf-berger I am not sure what you mean by a changeset. I have done the following:

ralf-berger commented 9 months ago

@jeanqussa Thanks. I prepared the following steps to change the live environments:

Renaming the corresponding volumes will cause them to be recreated, is that acceptable for both preview as well as production environment?

jeanqussa commented 9 months ago

@ralf-berger Thank you. It is acceptable that the volumes will be recreated for preview as well as production, although we would like to first deploy only on edge.coursemapper.soco.inko.cloud while we test.

ralf-berger commented 9 months ago

@jeanqussa Okay, just tell me when the changes are in master and I'll modify the environment accordingly.

jeanqussa commented 9 months ago

@ralf-berger I have merged the pull request just now. The changes are now in main.

ralf-berger commented 9 months ago

@jeanqussa https://edge.coursemapper.soco.inko.cloud/ has been updated and is ready for testing.

jeanqussa commented 9 months ago

@ralf-berger It seems that it can no longer connect to webserver. Did you remove the location /api section in the nginx configuration file?

jeanqussa commented 9 months ago

Or forgot to rename upstream.

ralf-berger commented 9 months ago

@jeanqussa webserver-web logs the following entry about some redirect regarding POST /statements. Does that mean it receives traffic or is this an outgoing connection attempt?

Log output ``` 2024-03-05T09:18:00.288Z follow-redirects options { maxRedirects: 21, maxBodyLength: Infinity, protocol: 'http:', path: '/statements', method: 'POST', headers: [Object: null prototype] { Accept: 'application/json, text/plain, */*', 'Content-Type': 'application/json', 'X-Experience-API-Version': '1.0.2', 'User-Agent': 'axios/1.6.7', 'Content-Length': '477671', 'Accept-Encoding': 'gzip, compress, deflate, br' }, agents: { http: undefined, https: Agent { _events: [Object: null prototype], _eventsCount: 2, _maxListeners: undefined, defaultPort: 443, protocol: 'https:', options: [Object: null prototype], requests: [Object: null prototype] {}, sockets: [Object: null prototype] {}, freeSockets: [Object: null prototype] {}, keepAliveMsecs: 1000, keepAlive: false, maxSockets: Infinity, maxFreeSockets: 256, scheduling: 'lifo', maxTotalSockets: Infinity, totalSocketCount: 0, maxCachedSessions: 100, _sessionCache: [Object], [Symbol(kCapture)]: false } }, auth: undefined, family: undefined, beforeRedirect: [Function: dispatchBeforeRedirect], beforeRedirects: { proxy: [Function: beforeRedirect] }, hostname: 'localhost', port: '', agent: undefined, nativeProtocols: { 'http:': { _connectionListener: [Function: connectionListener], METHODS: [Array], STATUS_CODES: [Object], Agent: [Function], ClientRequest: [Function: ClientRequest], IncomingMessage: [Function: IncomingMessage], OutgoingMessage: [Function: OutgoingMessage], Server: [Function: Server], ServerResponse: [Function: ServerResponse], createServer: [Function: createServer], validateHeaderName: [Function: __node_internal_], validateHeaderValue: [Function: __node_internal_], get: [Function: get], request: [Function: request], setMaxIdleHTTPParsers: [Function: setMaxIdleHTTPParsers], maxHeaderSize: [Getter], globalAgent: [Getter/Setter] }, 'https:': { Agent: [Function: Agent], globalAgent: [Agent], Server: [Function: Server], createServer: [Function: createServer], get: [Function: get], request: [Function: request] } } } updateSentStatements: 0 statements are updated ```
jeanqussa commented 9 months ago

I don't know what that is, but it doesn't look like an incoming request.

jeanqussa commented 9 months ago

@jeanqussa Thanks. I prepared the following steps to change the live environments:

* Remove coursemapper-kg-web (including routing of incoming web traffic)

* Migrate webserver to webserver-web (splitting this component into multiple sub-components)

* Migrate coursemapper-kg-neo4j to webserver-neo4j (passing connection data to webserver-web)

* Add webserver-redis (passing connection data to webserver-web)

* Add coursemapper-kg-worker-concept-map (passing webserver-* connection data =/)

* Add coursemapper-kg-worker-resommendation (passing webserver-* connection data =/)

Renaming the corresponding volumes will cause them to be recreated, is that acceptable for both preview as well as production environment?

If you are using proxy service, the problem might be that I didn't update the service names to the new ones yet. Should I update the service names and push again?

ralf-berger commented 9 months ago

Found a mistake, I accidentially called a Service object coursemapper-webserver-webserver-webserver instead of coursemapper-webserver-webserver-web.

jeanqussa commented 9 months ago

It works now. Thank you.

jeanqussa commented 9 months ago

webapp is making a request to .../kg-api, which was removed. There is no mention of it in the code anymore. Did you rebuild webapp?

ralf-berger commented 9 months ago

Oh well, the build of the merge request failed. I'll restart it.

ralf-berger commented 9 months ago

Image build relies on an unreliable external service.

Sciebo
jeanqussa commented 9 months ago

I am using the wrong environment variable name. I have fixed and and am checking if it works.

It is deployed now, but it seems that webserver-web cannot connect to webserver-neo4j. Could you please check that the credentials are correct?

ralf-berger commented 9 months ago

I'm getting messages about failing Redis connection attempts to 127.0.0.1:6379, even though REDIS_HOST should to be configured correctly.

jeanqussa commented 9 months ago

I have pushed a fix.

jeanqussa commented 9 months ago

Could we somehow see the logs for coursemapper-kg-worker-concept-map?

ralf-berger commented 9 months ago
Log ``` bin/worker: Running application ... /home/app/.venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0 /home/app/app/algorithms/elmo_2x4096_512_2048cnn_2xhighway_options.json 2024-03-05 18:09:53,152 - __main__ - INFO - Starting worker... 2024-03-05 18:09:53,157 - __main__ - INFO - Worker 698712 ready to accept jobs 2024-03-05 18:10:02,954 - __main__ - INFO - Received concept-map job for 07fae9ac00deec46f6e53bed33818ec1fe92903ba4c9fce0633e20d5a5d2b206... 2024-03-05 18:10:02,957 - __main__ - INFO - Processing concept-map job 07fae9ac00deec46f6e53bed33818ec1fe92903ba4c9fce0633e20d5a5d2b206... 2024-03-05 18:10:04,756 - app.services.course_materials.db.neo4_db - INFO - Check if learning material '65e75fa1e43fd19855b3e7ce' exists 2024-03-05 18:10:04,787 - app.services.course_materials.kwp_extraction.dbpedia.data_service - INFO - Found learning material '65e75fa1e43fd19855b3e7ce 2024-03-05 18:10:04,792 - app.services.course_materials.db.neo4_db - INFO - Geting the concepts of learning material '65e75fa1e43fd19855b3e7ce' 2024-03-05 18:10:04,800 - app.services.course_materials.db.neo4_db - INFO - Geting the relationships of learning material '65e75fa1e43fd19855b3e7ce' Execution time: 1.8495628833770752 2024-03-05 18:10:07,964 - __main__ - INFO - Finished processing concept-map job 07fae9ac00deec46f6e53bed33818ec1fe92903ba4c9fce0633e20d5a5d2b206 ```
jeanqussa commented 9 months ago

Is it possible to see the logs from previous runs?

jeanqussa commented 8 months ago

It looks like the coursemapper-kg-worker-concept-map is just getting killed and restarted at some point. Is it possible that there are imposed memory limits?

ralf-berger commented 8 months ago

Yes, the workers have a memory limit of 4 GiB each, just like the previous coursemapper-kg. How much memory do they require?

jeanqussa commented 8 months ago

10 GiB should be enough for coursemapper-kg-worker-concept-map, and 4 GiB for coursemapper-kg-worker-recommendation.

ralf-berger commented 8 months ago

Updated—do you still see issues with restarts?

jeanqussa commented 8 months ago

No, the issue is now resolved. Thank you.

jeanqussa commented 8 months ago

@ralf-berger We are still having the issue with some inputs. Would it possible to increase the limit to 12 GiB for coursemapper-kg-worker-concept-map?

ralf-berger commented 8 months ago

@jeanqussa When do you want the production environment to be updated?

jeanqussa commented 8 months ago

@ralf-berger I am not sure yet, but I believe it has to happen before the end of the month. I will let you know in the next few days.

jeanqussa commented 8 months ago

@ralf-berger The webserver service is down. Any idea why?

ralf-berger commented 8 months ago

The webserver service is down. Any idea why?

765?

jeanqussa commented 2 months ago

@ralf-berger Can we start 2 additional services, coursemapper-kg-worker-modify-graph` andcoursemapper-kg-worker-expand-material? Their configuration is incoursemapper-kg/concept-map/compose.yaml`. Memory requirement for each is 4 GiB.

rawaa123 commented 2 months ago

@ralf-berger Sorry for the repetition, as @jeanqussa mentioned above, can we start 2 additional services, coursemapper-kg-worker-modify-graph` and coursemapper-kg-worker-expand-material? Their configuration is incoursemapper-kg/concept-map/compose.yaml`. Memory requirement for each is 4 GiB. Your help is highly appreciated

ralf-berger commented 2 months ago

@ralf-berger Sorry for the repetition, as @jeanqussa mentioned above, can we start 2 additional services, coursemapper-kg-worker-modify-graph` and coursemapper-kg-worker-expand-material? Their configuration is incoursemapper-kg/concept-map/compose.yaml`. Memory requirement for each is 4 GiB. Your help is highly appreciated

Seems like something got messed there, the directory hierarchy doesn't match the proposed names and the Compose configuration you mentioned builds exactly the same container image three times using different names. ./coursemapper-kg/README.md doesn't explain anything. Could you try to describe the purpose of the individual services and how they're related?

jeanqussa commented 2 months ago

@ralf-berger Sorry for the repetition, as @jeanqussa mentioned above, can we start 2 additional services, coursemapper-kg-worker-modify-graph` and coursemapper-kg-worker-expand-material? Their configuration is incoursemapper-kg/concept-map/compose.yaml`. Memory requirement for each is 4 GiB. Your help is highly appreciated

Seems like something got messed there, the directory hierarchy doesn't match the proposed names and the Compose configuration you mentioned builds exactly the same container image three times using different names. ./coursemapper-kg/README.md doesn't explain anything. Could you try to describe the purpose of the individual services and how they're related?

It is the same image, but three each service uses different environment variables. Since we are short on time, a quick solution would be to keep the services as they are in deployment, but change the PIPELINES environment variable for kg-concept-map as follows:

PIPELINES=concept-map,modify-graph,expand-material
rawaa123 commented 2 months ago

Hello @ralf-berger , I would like to follow up on the status of this request as its quite urgent, please. Additionally, I am currently encountering some issues on Edge deployment that I cannot reproduce locally on main. I have issued 2 PRs but unfortunately, the job status is showing as failed. Your prompt assistance would be greatly appreciated. Thank you.

ralf-berger commented 2 months ago

PIPELINES is changed according to the comment above.