nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 115 forks source link

Question about TODOs #377

Open tomahawk360 opened 1 week ago

tomahawk360 commented 1 week ago

Hi! First of all, thanks for developing this tool. Having the experience of building Hadoop clusters "by hand", flintrock has been extremely useful. I decided to see how flintrock works, and found these TODOs, more specifically the ones about "Centralizing logic" to get ec2 clusters characteristics:

https://github.com/nchammas/flintrock/blob/c3dee6da06af4e6ca094ac6f0b4365d091c2541b/flintrock/ec2.py#L179C1-L206C31

What does "Centralize logic" stands for? What's left to be done in those sections?

nchammas commented 1 week ago

I wrote those comments almost 10 years ago, so I will have to take a guess at what exactly I meant. :)

https://github.com/nchammas/flintrock/blob/c3dee6da06af4e6ca094ac6f0b4365d091c2541b/flintrock/ec2.py#L180-L186

This probably just means I wanted to write a little function that I could reuse across the codebase to get this base security group. I'm not sure how many places would need to be refactored or if such a function is really needed, but I think that's what I meant when I wrote it.

https://github.com/nchammas/flintrock/blob/c3dee6da06af4e6ca094ac6f0b4365d091c2541b/flintrock/ec2.py#L199-L205

This is probably the same idea. There must be several places in the codebase with similar logic, where I'm trying to get the cluster-specific security group based on the cluster name. Instead of repeating that logic, a reusable function would be handy.

tomahawk360 commented 4 days ago

Hello, sorry for the late response.

Indeed, the same code bit repeats on three different occasions, each part having the same TODO comment, so it must refer to a replacement of those bits with a reusable function.

However, I found a function that could serve said purpose in the codebase:

https://github.com/nchammas/flintrock/blob/c3dee6da06af4e6ca094ac6f0b4365d091c2541b/flintrock/ec2.py#L493-L516

This function is only used in the launch method, but it could be used for the "Centralize logic" TODOs as well, like for example the first TODO instance could be rewritten as:

flintrock_base_group = get_security_groups(
    vpc_id=self.vpc_id,
    region=self.region,
    security_group_names=['flintrock']
)[0]

Is there a reason why that function is not being used for the "Centralize logic" TODOs?