Closed djmore4 closed 2 years ago
@djmore4 Thank you for submitting the proposal.
To help us better understand what you are proposing, do you have a use case you could share that you can't accomplish with the current settings that you hope to be able to do with the new proposed ones?
Sure, thanks for the quick response @kencochrane! Currently the company I work for uses a user portal which is not accessible on the public web for our production deployments. We have been considering exposing these portals to the public web to help reduce configuration complexity, but along with that comes concern of course. In order to not seriously inconvenience our users, we have historically kept our default setting for DEFENDER_COOLOFF_TIME
low (around 2 minutes) to ensure that if someone were to somehow get access to the site without having an appropriate password, they will get locked out, and we won't get completely bombarded. However, we consider that this could be a security issue if we expose these sites to the public internet. Because of this, having the ability to individually tweak these settings such that we can balance our desire for security with our desire to not inconvenience our customers would be very beneficial. Specifically, having the ability to configure a scaling policy for the lockout time (not necessarily in the same way I proposed) would help to assure our customers that having these sites exposed to the public internet is not a security risk (banks ask a lot of questions and have quite a lot of demands 😃). A good example would be something like setting the proposed DEFENDER_ATTEMPT_COOLOFF_TIME
to be something like 120 and DEFENDER_LOCKOUT_COOLOFF_TIME
to something like [120, 240, 480, 960].
For full transparency, I more wrote this up just to start a conversation, I probably should have been more clear about that in the original post. My design proposal can almost certainly be improved by someone with more intimate knowledge of the project's internals.
tl;dr: We basically want to have a way to ensure that we can punish repeat offenders more harshly to further dissuade DDoS attacks, without having our normal customers need to wait an hour if they get locked out because they forgot their password.
Thank you for writing that up, that makes sense. I work for a bank so I'm well aware of all the different requirements they add :)
Your proposal seems reasonable to me. I agree that we should make this as backwards compatible as possible. The new settings should be optional and if not set, are defaulted to the current way it works today.
I personally don't have time to work on this, would you be willing to work on it and submit the PR?
Whomever works on it will need to include the unit tests to make sure it works correctly and doesn't break existing functionality.
We will also need to update the documentation, that shouldn't be too bad since you did a nice job of that in your proposal, we just need to copy that over to the docs and include some examples.
From a professional standpoint, I think I might have time in about a month if I can get this included in our scope of work though I'm really not sure how likely that is.
Outside of a professional capacity I'll probably start looking into the source code in case I end up being the one who makes the change. To that point, where do you think would be a good place to start looking in the code to get an idea of what I'll need to understand for this change? As I said though, if it doesn't happen from within my company it will probably take a bit longer for me to get it done.
Actually from what I'm seeing, the change shouldn't be too bad, so I might be able to make the changes sooner than I thought. If anyone else wants to work on it they're welcome to but if I make any real progress on it I'll update here
Question for you @kencochrane. I'm getting this error when trying to run the tests with either command mentioned in the documentation.
Alright, so I've worked past that by just adding the middleware it was complaining about to the test_settings.py file but now I find that there are a bunch of failures in the tests (specifically 34 on the base fork of the master branch and 36 in my branch off the fork).
Most of these seem completely unrelated to my changes given that as I said there seem to be 34 failures in the tests in the base fork branch so I'm not exactly sure how I proceed here. Any advice @kencochrane?
Here are the errors I'm seeing errors.txt
@kencochrane please let me know when you get the chance. Also, is there any interest in trying to mitigate any potential performance impact of having this configuration set by storing at least some of this information in the cache? If so what kind of ttl would be acceptable in your eyes?
sorry for the slow reply, looking at it now, I was able to reproduce what you saw locally.
@djmore4 it looks like the docs are a little out of date, you will need to run tox -v
to run the tests. It is assuming you have Redis running locally. I'm currently working on getting it setup with docker so it will run the same way as in our CI, but I have an appointment so I don't know if I will get done today. But I wanted to give you an update incase that was enough to get you unstuck.
To give you an update on where I am.
It looks like when we switched over to using Github Actions, we switched to using Tox for the unit tests. Tox expects that you have multiple python versions installed locally, on a mac you can do this with pyenv. When I run the unit tests with mockredis
the tests seem to fail. If I start Redis up locally on port 6379, they pass fine. Not sure what broke the mockredis.
I tried to get these all working, but I ran out of time. If someone has time to figure out the right way to get tox, pyenv and mockredis working together that would be great. I also tried to get it running in a docker container so that it would make the setup much easier to reproduce, but no luck, the python versions I had installed in the container were not found by pyenv.
Sorry, I wasn't much help, but since the github actions work fine, you can always submit a PR with your changes and see the results in the PR when the actions kick off. I know it isn't ideal, but at least it is a good check moving forward.
@kencochrane Ok thanks for the update, that's actually very helpful! What versions of python does it expect to have installed locally? Or will it let me know? I'll try it later regardless but if you know off-hand that would be great. Also is there a requirement on Redis version? I doubt it but I just wanted to check.
I'll try to get everything sorted and see if I can get it all passing locally but if not I'll see how the github action check works.
As far as python versions go, I'm not 100% sure, but I assume the latest versions of 3.7, 3.8, 3.9, 3.10, and pypy3 at least that is what we have in the Tox configs.
I don't think we have a minimum redis version, I would assume at least 5.0, but 4.0 might even work.
From what I've seen locally it works down even with 3.x Redis versions.
I'm having a bit of trouble with getting tox working locally and I'm realizing I may need to remake my changes because I made them from a fork of the repo. I'll see how that goes too though.
@kencochrane there seems to be a problem with the github action as well... Unless I somehow broke something there, but I don't think I did.
Ok actually a little confused now. Maybe there was actually a bug still. Either that or something went wrong with the check. I added a second commit to the PR and the pre-commit.ci check passed in seconds but it doesn't seem to have re-run the test builds. Should I try to make a new PR?
It looks like I need to approve the checks before they run since you are a first time contributor. Once I see the email I can go in an approve.
I think the same PR should be fine.
It looks like there is an issue with celery.
If we add the following to the requirements.txt
importlib-metadata<5.0
It should fix that issue you are seeing.
@kencochrane alright I just pushed the change to add importlib-metadata<5.0 to the requirements.txt file
Alright, it seems to be working now. Looks like I have some tests to fix. Thanks for all the help @kencochrane
You're welcome, sorry for the trouble.
No worries, one last question for you hopefully. The tests seem to be failing due to self.test_valid_login()
not returning a 302. Any ideas on why that might be happening?
Sorry, it has been a little while since I have been in this code. But my first guess would be probably something related to this? https://github.com/jazzband/django-defender/blob/master/defender/decorators.py#L6
since It defaults to 302.
I've tried a slightly different approach. Hopefully the tests will succeed this time if you can approve running them again @kencochrane.
@kencochrane Alright I think they should work this time. If they don't I'm really starting to run out of ideas... I have a couple of theories as to why they might be failing at this point but I'm hoping I'm wrong because it would really be a drag
@kencochrane I'm getting a bit frustrated to be honest with you. Is there anything that can be done to help fix the mockredis setup so I can at least try to test this locally? It's very difficult for me to pinpoint what is going wrong with just the failing assert statement from the tox logs... Maybe that means I should be writing my tests differently but I'm not sure. Any advice would be appreciated
Also, as I mentioned tox seems to be problematic for me. Is it still possible to use the old testing methods? Those are at least runnable for me locally (though they aren't working as we discussed)
Sorry for the trouble, I tried to fix it this weekend but I wasn't able to figure it out.
I'm not familiar with tox, someone added those tests, and I haven't used it enough to figure out how best to use it locally.
If I get time this weekend I'll see if I can figure out what is wrong with mockredis, not sure why that isn't working.
Sorry, I'm not frustrated with you or anything. I just thought I had everything nice and tidy and this is being a thorn in my side ☹️
@kencochrane I made some more changes to hopefully help me get to the bottom of the issue. Can you please allow the CI workflow to run for me?
LOL wait... they're actually working now, that wasn't the plan haha. Ok I think I know what the problem was. I'm going to clean things up a little bit and see if doing things slightly more generically works as well. If it doesn't I'll probably make some adjustments to the code to allow for it
@kencochrane Once more if you don't mind. I think we're almost done with this whole thing (at least for this PR)
@kencochrane ok I think this should be the last one hopefully unless one of the new tests I added fail. Please also have a look at them if you don't mind because I'm not entirely clear on if it's meant to be a valid configuration option to have LOCKOUT_BY_IP_USERNAME set while also having either DISABLE_IP_LOCKOUT or DISABLE_USERNAME_LOCKOUT set to True. If it is, I'll need to make another adjustment to the logic so I don't cause any problems there. Also, if it's not meant to be a valid configuration option, I can add some logic to config.py to throw an Exception if that is configured and also add some tests.
And hopefully this will be the last question I'll need to ask. For some reason the test coverage logic logic wasn't recognizing that I did in fact add tests to handle many if not all of the lines I added to config.py so I needed to add the pragma: no cover comments to the lines so it wouldn't reject the PR based on test coverage. Any recommendations there?
That's weird, not sure why that didn't work. The only thing that comes to mind is maybe something is getting cached?
I would have to look into the way we are figuring out the coverage with tox and see if there is some other config that needs to be changed. I wouldn't think so, but you never know.
Ok, well https://github.com/jazzband/django-defender/pull/224 is ready for review and merging pending any recommended changes
@kencochrane I think I figured out why the coverage was saying the lines weren't covered by tests. I think there was a misunderstanding on my part about how the @patch(...)
decorator actually worked. For whatever reason, I expected that it would run through the appropriate logic to set the property but that must not be the case right? Otherwise I would expect the coverage tool to know what is going on but it could just be another misunderstanding on my part.
Hey @kencochrane, is there any plan to review and merge my Pull request any time soon or are we waiting on something from my end?
@djmore4 sorry for the delay, things got busy, I'll review it now.
@kencochrane Thanks for the review, I'll update it this week at some point
Hey @kencochrane sorry for the long delay I've been super busy. I'm looking to update the readme this weekend or shortly thereafter but I just wanted to clarify if you had a preference where the new example should go in the readme?
I feel as though it will be too much information to cram under the list headings in Customizing django-defender
so would some addendum text be appropriate within that section? I'm going to move forwards in that direction until I hear back. Thanks
What ever you think looks right. As long as it is all contained in the same README we should be good. Thank you!
@kencochrane alright it's been updated once more. Hopefully it looks good to you but I'm happy to make more changes
@kencochrane If you can let me know if there are any changes you want made today I'd really appreciate it because I don't really know when I'll have enough free time to take care of things next. If you can't then obviously it will just have to wait but if you get back to me I'm hoping I can take care of it today
Sorry for the delay. This looks great, thank you.
No worries, thanks for getting to it!
Oh also I just thought of this but do you think it would be worth while to try to update the section of the README about running the tests? I'm not exactly sure what should be written since I couldn't really ever get it working locally but it should probably be mentioned that those testing methods are out of date and contributors should use tox
instead.
I agree it should probably be updated, but like you, I haven't figured out how to get them running locally yet. But a minimum they should be updated to help prevent confusion.
Right, sounds good. Do you want me to update it in this pull request or will that be done separately?
Also I was looking back through our conversation and noticed you mentioned the CI is using docker. Is there a base image that could be pulled down locally? Then we could maybe just inject code into the image and run the tests from within the container
This is a proposal for multiple (in my opinion) enhancements in the functionality available for configuration by django-defender.
Proposals
First and foremost, in the documentation we can see reference to
DEFENDER_COOLOFF_TIME
.DEFENDER_COOLOFF_TIME: Int: If set, defines a period of inactivity after which old failed login attempts will be forgotten. An integer, will be interpreted as a number of seconds. If 0, the locks will not expire. [Default: 300]
This is very useful, however there should ideally be a way for a developer to independently configure how long they want django-defender to remember failed login attempts versus how long to remember that a username and/or ip address has been locked out.
I propose adding two optional override settings (I'll call them
DEFENDER_ATTEMPT_COOLOFF_TIME
andDEFENDER_LOCKOUT_COOLOFF_TIME
for the remainder of the proposal) that will override the individual cases I'm discussing forDEFENDER_COOLOFF_TIME
. Alternatively these properties could simply replaceDEFENDER_COOLOFF_TIME
but in the interest of backward compatibility I figured addition would be better.DEFENDER_ATTEMPT_COOLOFF_TIME
would be defined nearly identically to the currentDEFENDER_COOLOFF_TIME
and the existing configuration property would have it's documentation updated as followsDEFENDER_COOLOFF_TIME: Int: If set, defines a period of inactivity after which old failed login attempts and username/ip lockouts will be forgotten. An integer, will be interpreted as a number of seconds. If 0, neither the failed login attempts nor the username/ip locks will expire. [Default: 300]
DEFENDER_ATTEMPT_COOLOFF_TIME: Int: If set, overrides the period of inactivity after which old failed login attempts will be forgotten set by DEFENDER_COOLOFF_TIME. An integer, will be interpreted as a number of seconds. If 0, the failed login attempts will not expire. [Default: DEFENDER_COOLOFF_TIME]
Finally
DEFENDER_LOCKOUT_COOLOFF_TIME
will also be quite similar to the current functionality. However, I additionally propose a secondary enhancement of accepting a list of integers as a setting for this property. This would allow developers to scale the lockout duration based on the number of lockouts the username/ip has experienced within a timeframe ofDEFENDER_ACCESS_ATTEMPT_EXPIRATION
hours.Therefore the updated documentation would look something like this
DEFENDER_LOCKOUT_COOLOFF_TIME: Int or List: If set, overrides the period of inactivity after which username/ip lockouts will be forgotten set by DEFENDER_COOLOFF_TIME. An integer, will be interpreted as a number of seconds. A list of integers, will be interpreted as a number of seconds for users with the integer's index being how many previous lockouts (up to some maximum) occurred in the last `DEFENDER_ACCESS_ATTEMPT_EXPIRATION` hours. If the property is set to 0 or [], the username/ip lockout will not expire. [Default: DEFENDER_COOLOFF_TIME]
Remaining questions
DEFENDER_ACCESS_ATTEMPT_EXPIRATION
in the same vein as these proposed changes?DEFENDER_STORE_ACCESS_ATTEMPTS
is set to True? Otherwise it cannot be guaranteed that cache values will not have been cleared out when preforming the next check. This is possibly an acceptable consequence of not storing the records in the database and so maybe can be overlooked, however it should certainly be noted in any updated documentation.