rmdmattingly / RowlogAnalysis

MIT License
5 stars 0 forks source link

New Feature: splitTrendByPerson #36

Closed Logar18 closed 4 years ago

Logar18 commented 4 years ago

This service will return:

Is it just in the bash that I can't see if it's returning the right result or not? @taddyb @rmdmattingly

rmdmattingly commented 4 years ago

I like the thought process here a lot, this PR write up is good, the testing is good. This is a fully functional, new service which is sick.

I'm going to take a more in depth look tonight and provide some feedback on how to make your code a little better -- like, I see there's that 36 line function findSplitTrendsByPerson.

When you write a function that large it should make you wonder whether it might be breaking the single responsibility principle: that each function should only be responsible for a single "thing", and that that "thing" should be as small and as generic as possible. I'll come up with some suggestions for how to better split up that code to be more modular -- modularity fosters code reusability which provides scalability and readability -- this is what separates good & okay programmers so reviewing this kind of work should be really good for your development as a programmer.

Logar18 commented 4 years ago

@rmdmattingly is this looking any better?

rmdmattingly commented 4 years ago

Hey so the changes are a lot better. Much better use of lines, smart reusability.

The more I think about this service, though, the more I think it should be generic beyond finding a specific person. Since we're already iterating through every workout it would take almost no extra time to calculate and return these trends for every person -- but if we kept things as is, and eventually ran into a use case where we wanted to use this service to get everyone's split trends, then we'd be looking at a significantly slower execution time because the latency associated with, say, 30+ api calls would be too much.

So I think we should make this service generic to just analyzing split trends, here's how I would do it:

def isValidIntensity(intensity):
 return intensity != 'Warmup' and intensity is not None

def isValidActivity(activity):
 return activity == 'Erg'

def findSplitTrends(workoutData, SplitManager, DateManager):
    splitTrendData = {}
    for workout in workoutData:
        intensity = workout['intensity']
        if isValidIntensity(intensity) and isValidActivity(workout['type']):
            name = workout['name']
            if name not in splitTrendData.keys():
                splitTrendData[name] = {}
            date = DateManager.fetchDateFromTimestampString(workout['time'])
            if date not in splitTrendData[name]:
                splitTrendData[name][date] = {}
            splitTrendData[name][date][intensity] = workout['avg_split']
    return splitTrendData

def run(workoutData, SplitManager, DateManager):
    return findSplitTrends(DateManager.getWorkoutsDataChronologically(workoutData), SplitManager, DateManager)

Something that I'd note about the code above: notice how I made my if statements clearer by taking the conditional and moving it into its own, clearly named, function. This is the kind of coding practice that makes work easily readable and maintainable -- having messy if statements without explanation within a function that's really trying to accomplish/coordinate a larger goal is a really easy way to let things get confusing.

Another thing to note is that this current code (and my suggestion) isn't necessarily accurate. As is we'll be taking the latest workout of any given day and assigning that as the split for a given intensity on that day. What we should be doing is averaging all available splits on a given day. This step is where the SplitManager will come in handy, as it can convert splits to seconds (which will allow for easy averaging), and then convert this calculated average back to a split.

Let me know if this all makes sense to you, happy to talk more & explain anything

rmdmattingly commented 4 years ago

So, to sum up my comment above, I've got two things:

rmdmattingly commented 4 years ago

Merged in master so that we won't run into merge conflicts and made a few small changes to smooth out the service. All should be working, but we will need to rethink this code later as it currently runs in O(n^2)

Logar18 commented 4 years ago

@rmdmattingly @taddyb I've changed it so that the data stored in splitTrendData[name][date][intensity] is now the total split seconds and workoutCount[name][date][intensity] is the count of workouts at that date/intensity.

I feel like it should be pretty straightforward from here to find the average, but I can't really think of way to do it. Any dictionary functions that would help?

rmdmattingly commented 4 years ago

@Logar18 you can iterate through workouts again and still consider it O(n) because O(2n) is linear -- as long as this next iteration is outside of the first iteration rather than nested.

so what I would do is iterate through the splitTrendData and divide the total seconds by the relevant workoutCount, producing the averages

Also we can remove all those .idea files and whatever, throw that directory in the gitignore

Logar18 commented 4 years ago

@rmdmattingly How is this looking?

rmdmattingly commented 4 years ago

I think it looks awesome, I just have those 2 comments and a recommendation to add the .idea directory to the gitignore so that we don't merge that bs to master.