Configuration enhancements for cache lifetime and user lockout period

djmore4 commented 1 year ago

This is a proposal for multiple (in my opinion) enhancements in the functionality available for configuration by django-defender.

Proposals

First and foremost, in the documentation we can see reference to DEFENDER_COOLOFF_TIME.

DEFENDER_COOLOFF_TIME: Int: If set, defines a period of inactivity after which old failed login attempts will be forgotten. An integer, will be interpreted as a number of seconds. If 0, the locks will not expire. [Default: 300]

This is very useful, however there should ideally be a way for a developer to independently configure how long they want django-defender to remember failed login attempts versus how long to remember that a username and/or ip address has been locked out.

I propose adding two optional override settings (I'll call them DEFENDER_ATTEMPT_COOLOFF_TIME and DEFENDER_LOCKOUT_COOLOFF_TIME for the remainder of the proposal) that will override the individual cases I'm discussing for DEFENDER_COOLOFF_TIME. Alternatively these properties could simply replace DEFENDER_COOLOFF_TIME but in the interest of backward compatibility I figured addition would be better.

DEFENDER_ATTEMPT_COOLOFF_TIME would be defined nearly identically to the current DEFENDER_COOLOFF_TIME and the existing configuration property would have it's documentation updated as follows

DEFENDER_COOLOFF_TIME: Int: If set, defines a period of inactivity after which old failed login attempts and username/ip lockouts will be forgotten. An integer, will be interpreted as a number of seconds. If 0, neither the failed login attempts nor the username/ip locks will expire. [Default: 300]

DEFENDER_ATTEMPT_COOLOFF_TIME: Int: If set, overrides the period of inactivity after which old failed login attempts will be forgotten set by DEFENDER_COOLOFF_TIME. An integer, will be interpreted as a number of seconds. If 0, the failed login attempts will not expire. [Default: DEFENDER_COOLOFF_TIME]

Finally DEFENDER_LOCKOUT_COOLOFF_TIME will also be quite similar to the current functionality. However, I additionally propose a secondary enhancement of accepting a list of integers as a setting for this property. This would allow developers to scale the lockout duration based on the number of lockouts the username/ip has experienced within a timeframe of DEFENDER_ACCESS_ATTEMPT_EXPIRATION hours.

Therefore the updated documentation would look something like this

DEFENDER_LOCKOUT_COOLOFF_TIME: Int or List: If set, overrides the period of inactivity after which username/ip lockouts will be forgotten set by DEFENDER_COOLOFF_TIME. An integer, will be interpreted as a number of seconds. A list of integers, will be interpreted as a number of seconds for users with the integer's index being how many previous lockouts (up to some maximum) occurred in the last `DEFENDER_ACCESS_ATTEMPT_EXPIRATION` hours. If the property is set to 0 or [], the username/ip lockout will not expire. [Default: DEFENDER_COOLOFF_TIME]

Remaining questions

Is there a need to also create split override configurations for DEFENDER_ACCESS_ATTEMPT_EXPIRATION in the same vein as these proposed changes?
Would it only make sense to allow the list based configuration I specified if DEFENDER_STORE_ACCESS_ATTEMPTS is set to True? Otherwise it cannot be guaranteed that cache values will not have been cleared out when preforming the next check. This is possibly an acceptable consequence of not storing the records in the database and so maybe can be overlooked, however it should certainly be noted in any updated documentation.

kencochrane commented 1 year ago

@djmore4 Thank you for submitting the proposal.

To help us better understand what you are proposing, do you have a use case you could share that you can't accomplish with the current settings that you hope to be able to do with the new proposed ones?

djmore4 commented 1 year ago

Sure, thanks for the quick response @kencochrane! Currently the company I work for uses a user portal which is not accessible on the public web for our production deployments. We have been considering exposing these portals to the public web to help reduce configuration complexity, but along with that comes concern of course. In order to not seriously inconvenience our users, we have historically kept our default setting for DEFENDER_COOLOFF_TIME low (around 2 minutes) to ensure that if someone were to somehow get access to the site without having an appropriate password, they will get locked out, and we won't get completely bombarded. However, we consider that this could be a security issue if we expose these sites to the public internet. Because of this, having the ability to individually tweak these settings such that we can balance our desire for security with our desire to not inconvenience our customers would be very beneficial. Specifically, having the ability to configure a scaling policy for the lockout time (not necessarily in the same way I proposed) would help to assure our customers that having these sites exposed to the public internet is not a security risk (banks ask a lot of questions and have quite a lot of demands 😃). A good example would be something like setting the proposed DEFENDER_ATTEMPT_COOLOFF_TIME to be something like 120 and DEFENDER_LOCKOUT_COOLOFF_TIME to something like [120, 240, 480, 960].

For full transparency, I more wrote this up just to start a conversation, I probably should have been more clear about that in the original post. My design proposal can almost certainly be improved by someone with more intimate knowledge of the project's internals.

tl;dr: We basically want to have a way to ensure that we can punish repeat offenders more harshly to further dissuade DDoS attacks, without having our normal customers need to wait an hour if they get locked out because they forgot their password.

kencochrane commented 1 year ago

Thank you for writing that up, that makes sense. I work for a bank so I'm well aware of all the different requirements they add :)

Your proposal seems reasonable to me. I agree that we should make this as backwards compatible as possible. The new settings should be optional and if not set, are defaulted to the current way it works today.

I personally don't have time to work on this, would you be willing to work on it and submit the PR?

Whomever works on it will need to include the unit tests to make sure it works correctly and doesn't break existing functionality.

We will also need to update the documentation, that shouldn't be too bad since you did a nice job of that in your proposal, we just need to copy that over to the docs and include some examples.

djmore4 commented 1 year ago

From a professional standpoint, I think I might have time in about a month if I can get this included in our scope of work though I'm really not sure how likely that is.

Outside of a professional capacity I'll probably start looking into the source code in case I end up being the one who makes the change. To that point, where do you think would be a good place to start looking in the code to get an idea of what I'll need to understand for this change? As I said though, if it doesn't happen from within my company it will probably take a bit longer for me to get it done.

djmore4 commented 1 year ago

Actually from what I'm seeing, the change shouldn't be too bad, so I might be able to make the changes sooner than I thought. If anyone else wants to work on it they're welcome to but if I make any real progress on it I'll update here

djmore4 commented 1 year ago

Question for you @kencochrane. I'm getting this error when trying to run the tests with either command mentioned in the documentation.

djmore4 commented 1 year ago

Alright, so I've worked past that by just adding the middleware it was complaining about to the test_settings.py file but now I find that there are a bunch of failures in the tests (specifically 34 on the base fork of the master branch and 36 in my branch off the fork).

Most of these seem completely unrelated to my changes given that as I said there seem to be 34 failures in the tests in the base fork branch so I'm not exactly sure how I proceed here. Any advice @kencochrane?

djmore4 commented 1 year ago

Here are the errors I'm seeing errors.txt

djmore4 commented 1 year ago

@kencochrane please let me know when you get the chance. Also, is there any interest in trying to mitigate any potential performance impact of having this configuration set by storing at least some of this information in the cache? If so what kind of ttl would be acceptable in your eyes?

kencochrane commented 1 year ago

sorry for the slow reply, looking at it now, I was able to reproduce what you saw locally.

kencochrane commented 1 year ago

@djmore4 it looks like the docs are a little out of date, you will need to run tox -v to run the tests. It is assuming you have Redis running locally. I'm currently working on getting it setup with docker so it will run the same way as in our CI, but I have an appointment so I don't know if I will get done today. But I wanted to give you an update incase that was enough to get you unstuck.

kencochrane commented 1 year ago

To give you an update on where I am.

It looks like when we switched over to using Github Actions, we switched to using Tox for the unit tests. Tox expects that you have multiple python versions installed locally, on a mac you can do this with pyenv. When I run the unit tests with mockredis the tests seem to fail. If I start Redis up locally on port 6379, they pass fine. Not sure what broke the mockredis.

I tried to get these all working, but I ran out of time. If someone has time to figure out the right way to get tox, pyenv and mockredis working together that would be great. I also tried to get it running in a docker container so that it would make the setup much easier to reproduce, but no luck, the python versions I had installed in the container were not found by pyenv.

Sorry, I wasn't much help, but since the github actions work fine, you can always submit a PR with your changes and see the results in the PR when the actions kick off. I know it isn't ideal, but at least it is a good check moving forward.

djmore4 commented 1 year ago

@kencochrane Ok thanks for the update, that's actually very helpful! What versions of python does it expect to have installed locally? Or will it let me know? I'll try it later regardless but if you know off-hand that would be great. Also is there a requirement on Redis version? I doubt it but I just wanted to check.

I'll try to get everything sorted and see if I can get it all passing locally but if not I'll see how the github action check works.

kencochrane commented 1 year ago

As far as python versions go, I'm not 100% sure, but I assume the latest versions of 3.7, 3.8, 3.9, 3.10, and pypy3 at least that is what we have in the Tox configs.

I don't think we have a minimum redis version, I would assume at least 5.0, but 4.0 might even work.

djmore4 commented 1 year ago

From what I've seen locally it works down even with 3.x Redis versions.

I'm having a bit of trouble with getting tox working locally and I'm realizing I may need to remake my changes because I made them from a fork of the repo. I'll see how that goes too though.

djmore4 commented 1 year ago

@kencochrane there seems to be a problem with the github action as well... Unless I somehow broke something there, but I don't think I did.

djmore4 commented 1 year ago

Ok actually a little confused now. Maybe there was actually a bug still. Either that or something went wrong with the check. I added a second commit to the PR and the pre-commit.ci check passed in seconds but it doesn't seem to have re-run the test builds. Should I try to make a new PR?

kencochrane commented 1 year ago

It looks like I need to approve the checks before they run since you are a first time contributor. Once I see the email I can go in an approve.

I think the same PR should be fine.

kencochrane commented 1 year ago

It looks like there is an issue with celery.

If we add the following to the requirements.txt

importlib-metadata<5.0

It should fix that issue you are seeing.

djmore4 commented 1 year ago

@kencochrane alright I just pushed the change to add importlib-metadata<5.0 to the requirements.txt file

djmore4 commented 1 year ago

Alright, it seems to be working now. Looks like I have some tests to fix. Thanks for all the help @kencochrane

kencochrane commented 1 year ago

You're welcome, sorry for the trouble.

djmore4 commented 1 year ago

No worries, one last question for you hopefully. The tests seem to be failing due to self.test_valid_login() not returning a 302. Any ideas on why that might be happening?

kencochrane commented 1 year ago

Sorry, it has been a little while since I have been in this code. But my first guess would be probably something related to this? https://github.com/jazzband/django-defender/blob/master/defender/decorators.py#L6

since It defaults to 302.

djmore4 commented 1 year ago

I've tried a slightly different approach. Hopefully the tests will succeed this time if you can approve running them again @kencochrane.

djmore4 commented 1 year ago

@kencochrane Alright I think they should work this time. If they don't I'm really starting to run out of ideas... I have a couple of theories as to why they might be failing at this point but I'm hoping I'm wrong because it would really be a drag

djmore4 commented 1 year ago

@kencochrane I'm getting a bit frustrated to be honest with you. Is there anything that can be done to help fix the mockredis setup so I can at least try to test this locally? It's very difficult for me to pinpoint what is going wrong with just the failing assert statement from the tox logs... Maybe that means I should be writing my tests differently but I'm not sure. Any advice would be appreciated

djmore4 commented 1 year ago

Also, as I mentioned tox seems to be problematic for me. Is it still possible to use the old testing methods? Those are at least runnable for me locally (though they aren't working as we discussed)

kencochrane commented 1 year ago

Sorry for the trouble, I tried to fix it this weekend but I wasn't able to figure it out.

I'm not familiar with tox, someone added those tests, and I haven't used it enough to figure out how best to use it locally.

If I get time this weekend I'll see if I can figure out what is wrong with mockredis, not sure why that isn't working.

djmore4 commented 1 year ago

Sorry, I'm not frustrated with you or anything. I just thought I had everything nice and tidy and this is being a thorn in my side ☹️

djmore4 commented 1 year ago

@kencochrane I made some more changes to hopefully help me get to the bottom of the issue. Can you please allow the CI workflow to run for me?

djmore4 commented 1 year ago

LOL wait... they're actually working now, that wasn't the plan haha. Ok I think I know what the problem was. I'm going to clean things up a little bit and see if doing things slightly more generically works as well. If it doesn't I'll probably make some adjustments to the code to allow for it

djmore4 commented 1 year ago

@kencochrane Once more if you don't mind. I think we're almost done with this whole thing (at least for this PR)

djmore4 commented 1 year ago

@kencochrane ok I think this should be the last one hopefully unless one of the new tests I added fail. Please also have a look at them if you don't mind because I'm not entirely clear on if it's meant to be a valid configuration option to have LOCKOUT_BY_IP_USERNAME set while also having either DISABLE_IP_LOCKOUT or DISABLE_USERNAME_LOCKOUT set to True. If it is, I'll need to make another adjustment to the logic so I don't cause any problems there. Also, if it's not meant to be a valid configuration option, I can add some logic to config.py to throw an Exception if that is configured and also add some tests.

djmore4 commented 1 year ago

And hopefully this will be the last question I'll need to ask. For some reason the test coverage logic logic wasn't recognizing that I did in fact add tests to handle many if not all of the lines I added to config.py so I needed to add the pragma: no cover comments to the lines so it wouldn't reject the PR based on test coverage. Any recommendations there?

kencochrane commented 1 year ago

That's weird, not sure why that didn't work. The only thing that comes to mind is maybe something is getting cached?

I would have to look into the way we are figuring out the coverage with tox and see if there is some other config that needs to be changed. I wouldn't think so, but you never know.

djmore4 commented 1 year ago

Ok, well https://github.com/jazzband/django-defender/pull/224 is ready for review and merging pending any recommended changes

djmore4 commented 1 year ago

@kencochrane I think I figured out why the coverage was saying the lines weren't covered by tests. I think there was a misunderstanding on my part about how the @patch(...) decorator actually worked. For whatever reason, I expected that it would run through the appropriate logic to set the property but that must not be the case right? Otherwise I would expect the coverage tool to know what is going on but it could just be another misunderstanding on my part.

djmore4 commented 1 year ago

Hey @kencochrane, is there any plan to review and merge my Pull request any time soon or are we waiting on something from my end?

kencochrane commented 1 year ago

@djmore4 sorry for the delay, things got busy, I'll review it now.

djmore4 commented 1 year ago

@kencochrane Thanks for the review, I'll update it this week at some point

djmore4 commented 1 year ago

Hey @kencochrane sorry for the long delay I've been super busy. I'm looking to update the readme this weekend or shortly thereafter but I just wanted to clarify if you had a preference where the new example should go in the readme?

I feel as though it will be too much information to cram under the list headings in Customizing django-defender so would some addendum text be appropriate within that section? I'm going to move forwards in that direction until I hear back. Thanks

kencochrane commented 1 year ago

What ever you think looks right. As long as it is all contained in the same README we should be good. Thank you!

djmore4 commented 1 year ago

@kencochrane alright it's been updated once more. Hopefully it looks good to you but I'm happy to make more changes

djmore4 commented 1 year ago

@kencochrane If you can let me know if there are any changes you want made today I'd really appreciate it because I don't really know when I'll have enough free time to take care of things next. If you can't then obviously it will just have to wait but if you get back to me I'm hoping I can take care of it today

kencochrane commented 1 year ago

Sorry for the delay. This looks great, thank you.

djmore4 commented 1 year ago

No worries, thanks for getting to it!

djmore4 commented 1 year ago

Oh also I just thought of this but do you think it would be worth while to try to update the section of the README about running the tests? I'm not exactly sure what should be written since I couldn't really ever get it working locally but it should probably be mentioned that those testing methods are out of date and contributors should use tox instead.

kencochrane commented 1 year ago

I agree it should probably be updated, but like you, I haven't figured out how to get them running locally yet. But a minimum they should be updated to help prevent confusion.

djmore4 commented 1 year ago

Right, sounds good. Do you want me to update it in this pull request or will that be done separately?

Also I was looking back through our conversation and noticed you mentioned the CI is using docker. Is there a base image that could be pulled down locally? Then we could maybe just inject code into the image and run the tests from within the container

jazzband / django-defender

Configuration enhancements for cache lifetime and user lockout period #223

Proposals

Remaining questions