Closed ChloeOCB closed 3 years ago
Hi @ChloeOCB! Are all your deployments up and running? You can view them in the K8s dashboard.
Hi @regisb,
Deployments are all up
$ kubectl get deployment -n openedx
NAME READY UP-TO-DATE AVAILABLE AGE
cms 1/1 1 1 90m
cms-worker 1/1 1 1 90m
elasticsearch 1/1 1 1 90m
forum 1/1 1 1 90m
lms 1/1 1 1 90m
lms-worker 1/1 1 1 90m
memcached 1/1 1 1 90m
minio 1/1 1 1 90m
mongodb 1/1 1 1 90m
mysql 1/1 1 1 90m
nginx 1/1 1 1 90m
rabbitmq 1/1 1 1 90m
smtp 1/1 1 1 90m
Pods are up and running even if many restarted or are always restarting
$ kubectl get pods -n openedx
NAME READY STATUS RESTARTS AGE
cms-bd7ccb69b-lftz7 1/1 Running 0 79m
cms-worker-559656496c-q4rbd 1/1 Running 4 79m
elasticsearch-799bbc7f4d-k9snv 1/1 Running 0 79m
forum-64d4598cbf-2gt7k 1/1 Running 7 79m
lms-54bc469496-hsszm 1/1 Running 3 79m
lms-worker-5dc86f6f46-2cngg 1/1 Running 4 79m
memcached-7d68b8875-fw98z 1/1 Running 0 79m
minio-bf486dd9d-rc4n5 1/1 Running 0 79m
mongodb-7df787764-jtskf 1/1 Running 0 79m
mysql-657b8df849-m2dlg 1/1 Running 0 79m
nginx-57d56f4fdb-j5lp4 1/1 Running 0 79m
rabbitmq-76f9f4844b-z86km 1/1 Running 0 79m
smtp-96dbf5995-dpplj 1/1 Running 0 79m
But services are not all available.
I can access to the minio interface but not to openedx interface.
The most meaningful logs is on the cms and lms pods:
$ kubectl logs cms-bd7ccb69b-lftz7 -n openedx
2020-02-27 10:40:12,818 ERROR 12 [root] signals.py:21 - Uncaught exception from None
Traceback (most recent call last):
File "/openedx/venv/local/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File "/openedx/venv/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 244, in _legacy_get_response
response = middleware_method(request)
File "/openedx/venv/local/lib/python2.7/site-packages/edx_django_utils/monitoring/middleware.py", line 119, in process_request
if self._is_enabled():
File "/openedx/venv/local/lib/python2.7/site-packages/edx_django_utils/monitoring/middleware.py", line 208, in _is_enabled
return waffle.switch_is_active(u'edx_django_utils.monitoring.enable_memory_middleware')
File "/openedx/venv/local/lib/python2.7/site-packages/waffle/__init__.py", line 23, in switch_is_active
switch = Switch.get(switch_name)
File "/openedx/venv/local/lib/python2.7/site-packages/waffle/models.py", line 50, in get
obj = cls.objects.get(name=name)
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 374, in get
num = len(clone)
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 232, in __len__
self._fetch_all()
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 1121, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/query.py", line 53, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
File "/openedx/venv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
raise original_exception
ProgrammingError: (1146, "Table 'openedx.waffle_switch' doesn't exist")
2020-02-27 10:40:12,819 ERROR 12 [django.request] exception.py:135 - Internal Server Error: /
$ kubectl logs lms-54bc469496-hsszm -n openedx
WARNING:py.warnings:/openedx/edx-platform/lms/djangoapps/courseware/__init__.py:5: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)
WARNING:py.warnings:/openedx/edx-platform/lms/djangoapps/courseware/__init__.py:5: DeprecationWarning: Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported
warnings.warn("Importing 'lms.djangoapps.courseware' as 'courseware' is no longer supported", DeprecationWarning)
2020-02-27 09:44:34,405 WARNING 10 [enterprise.utils] utils.py:50 - Could not import Registry from third_party_auth.provider
2020-02-27 09:44:34,405 WARNING 10 [enterprise.utils] utils.py:51 - cannot import name _LTI_BACKENDS
2020-02-27 09:44:34,429 WARNING 12 [enterprise.utils] utils.py:50 - Could not import Registry from third_party_auth.provider
2020-02-27 09:44:34,430 WARNING 12 [enterprise.utils] utils.py:51 - cannot import name _LTI_BACKENDS
2020-02-27 10:07:52,264 ERROR 12 [django.security.DisallowedHost] exception.py:80 - Invalid HTTP_HOST header: 'XX.XX.XXX.XXX:32153'. You may need to add u'XX.XX.XXX.XXX' to ALLOWED_HOSTS.
10.233.100.0 - - [27/Feb/2020:10:07:52 +0000] "GET / HTTP/1.1" 400 26 "android-app://com.google.android.googlequicksearchbox" "Mozilla/5.0 (Linux; Android 8.0.0; WAS-LX1A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.62 Mobile Safari/537.36"
2020-02-27 10:07:52,855 ERROR 10 [django.security.DisallowedHost] exception.py:80 - Invalid HTTP_HOST header: 'XX.XX.XXX.XXX:32153'. You may need to add u'XX.XX.XXX.XXX' to ALLOWED_HOSTS.
10.233.100.0 - - [27/Feb/2020:10:07:53 +0000] "GET /favicon.ico HTTP/1.1" 400 26 "http://XX.XX.XXX.XXX:32153/" "Mozilla/5.0 (Linux; Android 8.0.0; WAS-LX1A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.62 Mobile Safari/537.36"
As deployment failed during database creation and migration, it seems that a table was not created.
Is there any workaround ?
Thanks.
Hi @ChloeOCB! Sorry about the late answer.
As deployment failed during database creation and migration, it seems that a table was not created.
Indeed, this is probably what happened. I suggest you delete the volumes attached to the mysql pod and run tutor k8s init
again.
Hi @regisb,
I have never run tutor k8s init
as it is not mentionned in the documentation.
Actually, I only run tutor k8s quickstart
.
Do I miss something ?
I tried your tip but it is always the same errors.
Are you able to reproduce this problem ?
@ChloeOCB I think I found the issue: the MinIO service cannot be found as indicated by: socket.gaierror: [Errno -2] Name or service not known
. This is probably due to the fact that the MinIO host is set to "minio.www.myopenedx.com", and you most certainly do not own the "myopenedx.com" domain name.
Are you deploying to a live production environment? Then you need to configure your DNS records to point at your Kubernetes cluster and you need to configure your platform to actually use these domain names (during quickstart).
@ChloeOCB can I close this?
Hi @regisb sorry for the delay.
Unfortunately I tried with another owned domain name and the issue is always the same.
It is not for a live production environment, only for testing.
I can not investigate further for now.
Any other idea about this issue ?
@ChloeOCB just wanted to say I did not forget about you. I'm currently in the process of improving the k8s stack and it should address some of your issues.
This should be working now that jobs wait for services to become available.
as of Feb 2021 this probem still persists, here's my logs:
...
================================================
Database creation and migrations
================================================
Waiting for a mysql pod to be ready...
kubectl wait --namespace openedx --selector=app.kubernetes.io/instance=openedx-laLjoz1coNV3xcJjWJDEQuAM,app.kubernetes.io/name=mysql --for=condition=ContainersReady --timeout=600s pod
error: no matching resources found
Error: Command failed with status 1: kubectl wait --namespace openedx --selector=app.kubernetes.io/instance=openedx-laLjoz1coNV3xcJjWJDEQuAM,app.kubernetes.io/name=mysql --for=condition=ContainersReady --timeout=600s pod
Look same with logs of @ChloeOCB.
If I try
tutor -r . k8s stop # stop current deployments
tutor k8s quickstart # run quickstart again
I seems work, mysql job start running, but then it fail because can't connect to mysql. I try but can't login to mysql, seem password somehow broken
@maitrungduc1410 please post your questions on the forums: https://docs.tutor.overhang.io/troubleshooting.html Try to add as much information as possible, including logs from your mysql container (not the mysql-job container).
Bug description
Not able to deploy Openedx with Tutor on Kubernetes cluster
How to reproduce
Running tutor twice and getting two different behaviors
First try
Second try
The first time, deployment fails immediately whereas the second time and without any change, deployment goes further but stops again and throws python exception
Environment
OS Ubuntu 16.04 tutor version 3.11.4 k8s server version 1.14.1 k8s client version 1.14.3