nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Disable automatic networks blocks on MIT network(s) #738

Open larsks opened 1 month ago

larsks commented 1 month ago

I have lost access to systems hosted on the ESI external network (128.31.20.0/22) due to what looks like some sort of automatic policy implementation. This looks like a repeat from May 2024, in which we opened INC1327837 with MIT support (the incident is in theory available here), but that requires an MIT kerberos login).

There response at that time was:

You tripped a protective firewall configuration via using SSH repeatedly. This is often an issue with customers who are using git over ssh or sshfs.

For MIT community members, our standing advice is to get on the MIT VPN before initiating the connections; it's possible that there's client configurations that reduce the number of connections created, however.

I've unblocked your address; I'm going to pass the ticket over to the security team for their review.

This is problematic on a number of fronts:

We need to have the automatic blocking behavior disabled on these networks.

larsks commented 1 month ago

I have emailed servicedesk@mit.edu to attempt to get myself unblocked.

msdisme commented 1 month ago

While we resolve the broader issue, can we explore using a fixed IP for the demo and having them pre-approve it?

larsks commented 1 month ago

I would ask them about the broader issue first. In any case, I think it's unlikely that our demo will run into problems; the issue seems to be primarily stem from patterns of ssh access.

larsks commented 1 month ago

MIT has opened ticket INC1405264 for this request.

msdisme commented 1 month ago

discuss with csail champion first. email MIT asking about possible options to remove or loosen rules since many of users are outside of MIT

larsks commented 1 month ago

MIT has unblocked my home ip:

We have unquarantined your IP. It was quarantined for the same reason as last time (rapid fire SSH connections). For your reference, it was due to connections all to the same IP within a short window of time, over 45 SSH connections to 128.31.20.138 within three minutes (between 2:15pm EST - 2:18pm EST).

If you haven't already, we recommend setting up SSH ControlMaster configuration for this host.

msdisme commented 1 month ago

Note I will send friday when I am back in the office:

MIT CSAIL Dear Garrett,

We are inquiring about possibly adjusting the IP address block settings around SSH within the CSAIL subnets we use.

One of our primary use cases is Elastic Secure Infrastructure (ESI), a way to share bare metal systems with students and researchers. This involves providing IP access to hardware, essentially bare-metal servers. Sometimes, these are configured as part of a cluster; other times, they are used with various operating systems needed for research.

Since these are bare metal servers, I do not think we can pre-configure them to support SSH Controllmaster.

Given the nature of our project and the need for reliable bare-metal access, we would like to explore the possibility of disabling or raising the thresholds before an IP address is blocked.

Thank you for your time and consideration. Please let us know if it would be helpful to discuss this further.

Sincerely,

Michael Daitzman

larsks commented 1 month ago

Since these are bare metal servers, I do not think we can pre-configure them to support SSH Controllmaster.

@msdisme The configuration they are suggesting is a client side configuration, not server side. I think the question of ssh configuration is simply a distraction and you should leave it out of your conversation. The only resources on this network are our resources and we should be the ultimate arbiters of what sort of utilization is appropriate.

I would suggest instead something like this:


Dear Garrett,

We are inquiring about possibly adjusting the security policies that apply to the CSAIL networks allocated to the Mass Open Cloud project.

One of our primary use cases is our Elastic Secure Infrastructure (ESI) service, a way to share bare metal systems with students and researchers. Sometimes, these systems are configured as part of a cluster; other times, they are used as standalone systems with various operating systems needed for research. Utilization of this environment has been increasing, in particular due to the availability of GPU-enabled hardware for AI research. With this increase in use we have seen some people encounter the ssh rate limits that CSAIL has configured on this network, resulting in their ip address getting blocked and preventing access to our environment. This is a frustrating situation for everyone: the researcher finds themselves locked out of their system and we find ourselves unable to remove the restrictions on access to a service that we provide.

Given the nature of our project, the nature of the research and system development being conducted on these systems, and the need for reliable bare-metal access, we would like to explore the possibility of disabling the policy or raising the thresholds before an IP address is blocked.

Thank you for your time and consideration. Please let us know if it would be helpful to discuss this further.

Sincerely,

Michael Daitzman

StHeck commented 1 month ago

I had this issue early on and switched to using ssh Controlmaster. I'm running an ansible playbook from home that brings up a complete Openstack cloud (3 control/3 compute) from baremetal. I'm not sure what the ssh connect rate is, but it's probably up there. I'm also using floating ip port forwarding, if that matters. I haven't had any issues since.

msdisme commented 3 weeks ago

Feedback was it is not CSAIL. Need to follow up with MIT network security. Garrett does not have a read on whether we will be successful or not, but they are a small team so may be a no. Will send mail to follow up.