Closed Vadorequest closed 5 years ago
Looks like there is a problem with packets going from AWS to jet server. We are investigating this issue now.
😢
I wasted hours trying to figure out what's wrong, could you please improve the error messages so at least we know it's not related to our app? It'd help avoid wasting so much time and report the issue directly to you.
Something like "Oops, something went wrong on Jet Admin http://jetadmin.io, we're investigating the issue. This error is most likely not related to your app, report additional info on Github" either in the log or server response, etc.
Also, this is a major issue because it totally broke my entire app. I couldn't run migrations anymore because of this.
Jet Admin should make sure that if something broke on your side, or isn't working correctly at least it wouldn't kill the whole app. I'm planning on building a CMS with Jet Django, running in parallel with an API powered by Django itself. Those are two distinct things and I can't have the CMS kill the API because something went wrong somewhere, especially if it's something on your side I have no control over.
If the CMS breaks, then so be it, it's bad enough as it is. But if it kills the API in the process then it's much much worse.
@Vadorequest JET can't kill your app. It just adds some new API endpoints and 1 new table. It does not do anything else to your application
That's what I thought too, but the thing is, it does.
If I enable the following lines in my Django app:
# INSTALLED_APPS
'jet_django', # XXX See https://docs.jetadmin.io
# urls.py
from jet_django.urls import jet_urls
url(r'^jet_api/', include(jet_urls)), # XXX See https://docs.jetadmin.io
Then my endpoint https://0fkluarvo2.execute-api.eu-west-1.amazonaws.com/staging/admin returns a 504 with {"message": "Endpoint request timed out"}
If I disable those same lines, my app run correctly. (without jet_api
endpoint, of course)
So, it does break my app because the connection times-out and the request gets stopped.
Here are some logs:
[1551292050162] Instancing..
[1551292054798] [DEBUG] 2019-02-27T18:27:34.798Z a78e9fad-dbf9-4a54-a933-9e3c1da3c01b Starting new HTTPS connection (1): api.jetadmin.io:443
[1551292073207] 2019-02-27T18:27:53.207Z 6dfb3557-ab08-4bf1-846e-cd19977b0e08 Task timed out after 30.03 seconds
[1551292080192] 2019-02-27T18:28:00.192Z a78e9fad-dbf9-4a54-a933-9e3c1da3c01b Task timed out after 30.03 seconds
[1551292081470] Instancing..
[1551292086261] [DEBUG] 2019-02-27T18:28:06.261Z 0a69b51d-5061-4f15-94a0-6c47b0b8ed05 Starting new HTTPS connection (1): api.jetadmin.io:443
To my understanding, Django fails to start and keep restarting because it timeouts when trying to reach api.jetadmin.io
. Because of this, it never starts, it keeps hanging. That's why the app is broken, because it cannot start at all.
@Vadorequest hm, i'm trying to investigate it. Here is the line which makes connection to Jet backend and times out:
jet_django/apps.py
...
try:
register_token()
except: # if no migrations yet
pass
...
There is try-except block which should bypass any errors, thought there is a 30s delay in startup since it trying to connect. So after 30s your app should launch any way, I don't see errors in your log output which may lead to crash.
By the way, problem with AWS servers requires us to move our servers, this issue will be resolving this Sunday.
Ahah, that's why it fails on my setup!
I use Zappa to power my Django app, and Zappa is basically a layer around AWS Lambda. The point is, I run on AWS Lambda and a Lambda has a ... 30 sec hard timeout.
So, since your own default timeout is 30sec, it waits too long and the Lambda timeout is reached, which kills the process, and try again.
There are two ways to resolve this issue:
In any case, this behaviour should be documented in the README because it will impact all kind of Serverless-based environment apps. And if people are not aware of this, then it'll work fine until everything breaks at once, if it's not lambda-friendly by design.
@Vadorequest i'm trying to reproduce this issue, but without success. I've setup an eu-west-1 (EU Ireland) EC2 AWS server and tried to get api.jetadmin.io page:
wget https://api.jetadmin.io/api/
It downloaded successfully. I've tried to execute https request to https://api.jetadmin.io/api/ with AWS Lambda service - also worked.
Could you please run this command on your server too? Maybe its something with permission/policies on your configuration?
It's not related to permission/policies since it had worked the day before and nothing has changed on my setup about that. (also, I'm running my tests with full AWS administrator access)
I'll try again.
@Vadorequest can we continue in some messenger to resolve this issue quicker? Skype, whatsapp, telegram? If so, please write me your contacts (on my email if you don't want to write it here)
I sent you an email at denis at kildishev.ru
I just tried again and I have a new error message when the Lambda starts using zappa tail
[1551604285683] Instancing..
[1551604291218] Application starting with ENV: staging
[1551604291718] bad magic number in 'jet_django': b'\x03\xf3\r\n': ImportError
Traceback (most recent call last):
File "/var/task/handler.py", line 580, in lambda_handler
return LambdaHandler.lambda_handler(event, context)
File "/var/task/handler.py", line 245, in lambda_handler
handler = cls()
File "/var/task/handler.py", line 151, in __init__
wsgi_app_function = get_django_wsgi(self.settings.DJANGO_SETTINGS)
File "/var/task/zappa/ext/django_zappa.py", line 20, in get_django_wsgi
return get_wsgi_application()
File "/var/task/django/core/wsgi.py", line 12, in get_wsgi_application
django.setup(set_prefix=False)
File "/var/task/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/var/task/django/apps/registry.py", line 89, in populate
app_config = AppConfig.create(entry)
File "/var/task/django/apps/config.py", line 90, in create
module = import_module(entry)
File "/var/lang/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 674, in exec_module
File "<frozen importlib._bootstrap_external>", line 888, in get_code
File "<frozen importlib._bootstrap_external>", line 455, in _validate_bytecode_header
ImportError: bad magic number in 'jet_django': b'\x03\xf3\r\n'
zappa/ext/django_zappa.py
Error message printed on the lambda at https://jugzryz6zi.execute-api.eu-west-1.amazonaws.com/staging/
{
'message': 'An uncaught exception happened while servicing this request. You can investigate this with the `zappa tail` command.',
'traceback': ['Traceback (most recent call last):\\n', ' File \"/var/task/handler.py\", line 518, in handler\\n response = Response.from_app(self.wsgi_app, environ)\\n', ' File \"/var/task/werkzeug/wrappers.py\", line 939, in from_app\\n return cls(*_run_wsgi_app(app, environ, buffered))\\n', ' File \"/var/task/werkzeug/test.py\", line 923, in run_wsgi_app\\n app_rv = app(environ, start_response)\\n', \"TypeError: 'NoneType' object is not callable\\n\"]}
The issue with ImportError: bad magic number in 'jet_django': b'\x03\xf3\r\n'
was related to https://github.com/Miserlou/Zappa/issues/854 and was fixed by doing a find . -name \*.pyc -delete
from the project root.
Alright, time for a summary on this (those issues):
I tried to access https://0fkluarvo2.execute-api.eu-west-1.amazonaws.com/staging/jet_api/model_descriptions/ directly from the browser and got {"detail":"You do not have permission to perform this action."}
:
This is normal because the endpoint requires an authentication, provided by Headers, and therefore not possible to just go to that url through the browser. That was not a real issue but the expected behaviour.
I tried to access https://0fkluarvo2.execute-api.eu-west-1.amazonaws.com/staging/admin
and got a 504 {"message": "Endpoint request timed out"}
:
This was because my Lambda runs inside a VPC and was not allowed to connect to internet.
This was very difficult to troubleshot and this error should be more obvious.
Thanks to @f1nality who helped me pinpoint the issue, I wasn't aware of this limitation since it's the first time I use Lambda inside a VPC (which is required to connect to AWS RDS Aurora)
Figuring out the Lambda couldn't talk to the Jet API took us a good ~90mn, not so obvious indeed.
Here is the script I used to "ping" external services from inside AWS Lambda: https://gist.github.com/Vadorequest/aff8e9ce2dcc4c67637a34f567190825
Resolving the issue, by allowing the Lambda to reach the internet took me several hours (like, almost 6h...), it's very hard, error-prone and technically complex to understand:
You can follow those steps to create a NAT gateway between Lambda and internet, you'll basically create new subnets, route tables (and an Internet gateway if your VPC isn't linked to one already). https://stackoverflow.com/questions/35455281/aws-lambda-how-to-setup-a-nat-gateway-for-a-lambda-function-with-vpc-access/39082826#39082826
If you want to understand why you are doing this in the first place, take a deep look into https://edgarroman.github.io/zappa-django-guide/aws_network_primer/ which is awesomely written and beginner-friendly.
https://gist.github.com/reggi/dc5f2620b7b4f515e68e46255ac042a7 is also quite useful to understand the subnets/route tables/NAT stuff, but I've found the SO answer better because this tutorial isn't so well written, steps aren't in the right order, for instance, and I got myself lost. Eventually, the SO answer helped me troubleshoot my mistakes, at the price of my Sunday afternoon 😄
If you're a bit familiar with the Serverless Framework, here is a template that could be useful. I haven't tested it at this time though. https://gist.github.com/efi-mk/d6586669a472be8ea16b6cf8e9c6ba7f
I don't know how that could be done, but giving even the slightest indication that internet isn't accessible would tremendously help developers going through this and are unaware of the Lambda/VPC closed network thing.
Due to the fact that the Lambda couldn't reach the api.jetadmin.io
, and there was no custom timeout on my side, this issue led to crash the whole application, because the Lambda would automatically timeout after 30 seconds. This unknown behaviour was also very difficult to figure out.
ImportError: bad magic number in 'jet_django': b'\x03\xf3\r\n'
which was totally unrelated. Most likely because I had moved folders in my app and the .pyc
files were outdated. See https://github.com/jet-admin/jet-django/issues/4#issuecomment-469009309django.db.utils.ProgrammingError: (1146, "Table 'tfp_backoffice_test.jet_django_token' doesn't exist")
I had forgotten to run the migrations because the app wasn't activated. Was fixed by running migrations.Forbidden: /staging/jet_api/model_descriptions/
again.For a few minutes, the https://app.jetadmin.io/app/tfp_backoffice DDoS my /jet_api/ endpoint. There is something wrong with UI which basically sent HTTP request to my /staging/jet_api/model_descriptions/
endpoint in a loop. This broke the Jet Admin app, because the browser became unresponsive, and also slowed down my computer due to the huge amount of logs displaying in the console in real-time. (please fix this!)
I see lots of OPTIONS requests being sent, about 2-3 per second. That's what DDoS my API (I disabled jet_django
that's why it shows 404, but still, shouldn't DDoS like that)
This happens when I'm at https://app.jetadmin.io/app/tfp_backoffice
And now... It's back to the behaviour I was experimenting a few days ago. The lambda basically timeouts after 30sec [1551656138200] 2019-03-03T23:35:38.179Z f9c2d096-8901-4cd0-99be-ae7ae0ae5766 Task timed out after 30.02 seconds
and the server can't start, which kills the whole app. But I don't have any further detail, there is no error message as for why it times out. The only thing I know is that it only does when the jet_django
app is activated in INSTALLED_APPS.
I also have tried using a tunnel from my localhost app and used it in https://app.jetadmin.io/app/tfp_backoffice settings api endpoint and it does work correctly. So the issue I'm facing only happens when jet_django is enabled, and only in AWS Lambda environment.
Is there a way to go into verbose mode with jet to enable better logging? Could help troubleshoot the issue.
I feel like there is something wrong that happens when the jet_api
is initialized, maybe a middleware, something that tries to authenticate, set token, or similar. Somehow the process fails but doesn't crash and doesn't get killed until the lambda itself times out and restart the process.
The issue regarding the timeout was related to the allocated memory on the Lambda.
I had allocated 128Mo, which is the minimal (and often enough), but it was not enough for JET internal call to JetAdminModelDescription
in admin/model_description.py
from jet.register(model)
in JetDjangoConfig
See those logs which prove the issue, look at the time the logs were generated and you'll see the Lambda is killed because the operation takes too long (30s): https://gist.github.com/Vadorequest/4d5ae7379202c72daeb9e5089297d886
Increasing to 512Mo fixed the issue.
Performance benchmark: (manual)
@f1nality I strongly suggest you add some logging on this part. Basically, knowing how much time the process takes would tremendously help. If it's close to 20-25 sec, or if the process fails then displaying a message about a lack of physical resources would be a good hint.
Anyway, this must be handled better, because it could fail any time if the time it takes is too close to 30s, adding a few new models (or another app!) could potentially crash the app and would be difficult to troubleshoot. It took me days
Now, my app can start properly whether jet_admin
is enabled or not, it doesn't break the django admin anymore because there is no timeout.
But still, it doesn't work yet.
When I go to https://app.jetadmin.io/app/tfp_backoffice/project/models, I get the following in the browser console:
{"detail":"You do not have permission to perform this action."}
And the following in the server logs:
[1551723882670] [DEBUG] 2019-03-04T18:24:42.670Z 3c403b8a-5e5d-45b3-baa9-0e78d7a0a1e6 Starting new HTTPS connection (1): api.jetadmin.io:443
[1551723882976] [DEBUG] 2019-03-04T18:24:42.976Z 3c403b8a-5e5d-45b3-baa9-0e78d7a0a1e6 https://api.jetadmin.io:443 "POST /api/project_auth/ HTTP/1.1" 400 39
[1551723882978] [JET] Project Auth request error: 400 Bad Request {"non_field_errors":["Not authorized"]}
[1551723882978] [ERROR] 2019-03-04T18:24:42.978Z 3c403b8a-5e5d-45b3-baa9-0e78d7a0a1e6 [JET] Project Auth request error: 400 Bad Request {"non_field_errors":["Not authorized"]}
[1551723882979] Forbidden: /staging/jet_api/model_descriptions/
[1551723882979] [WARNING] 2019-03-04T18:24:42.979Z 3c403b8a-5e5d-45b3-baa9-0e78d7a0a1e6 Forbidden: /staging/jet_api/model_descriptions/
The issue was happening because I had dropped my online DB and the token had been regenerated and wasn't matching the token from my project on https://app.jetadmin.io
I wrote a script to change the value whenever I do that again in the future. (I do drop the DB frequently due to bad DB schema, easier to erase and rebuild since it's a work in progress)
With this being resolved, my Jet Admin app finally works and it's the end (or beginning? 😆 ) of my troubles with Jet Admin & Zappa on AWS Lambda configuration. :)
I have deployed my API but I get the following
https://0fkluarvo2.execute-api.eu-west-1.amazonaws.com/staging/jet_api/model_descriptions/
I really don't understand what is wrong, I have a minimal setup with a db.sqlite3 database and not much besides that. It was working fine then broke at some point.