msambol / dsa

Data structures and algorithms in X minutes. Code examples from my YouTube channel.
https://youtube.com/michaelsambol
MIT License
412 stars 90 forks source link

Some inconsistencies in representation of sets #9

Closed Takashiidobe closed 5 months ago

Takashiidobe commented 5 months ago

:wave: hello, thanks for this repo with the code -- I watched your videos many years ago when I was teaching myself how to code, going to UNL's libraries on the daily (I see you are a fellow Nebraskan as well).

I was reading through the code and found some discrepancies in how visited sets are represented, which could be shored up to make the code easier to follow:

  1. This code uses a dictionary of node -> bool, when a set works fine. https://github.com/msambol/dsa/blob/master/shortest_path/dijkstras.py#L21C1-L29C26

If this code initialized an empty visited set, the line of visited[node] = False in the graph iteration could be removed.

  1. This code uses an array which has O(n) access to check for visited, which makes the dfs algorithm O(n^2) instead of O(n) https://github.com/msambol/dsa/blob/master/search/depth_first_search.py#L18C1-L29C32

(There is a note in another part of the code that changes a list to a queue to improve time complexity, so I assume this change is in scope). https://github.com/msambol/dsa/blob/master/search/breadth_first_search.py#L19C1-L21C20

  1. This code uses a list for visited, which is probably faster than the set in this case but is another representation of a set: https://github.com/msambol/dsa/blob/master/maximum_flow/ford_fulkerson.py#L21C1-L21C31
msambol commented 5 months ago

Hey @Takashiidobe! Thank you for the feedback. Yep, I'm originally from Omaha, Nebraska.

This code uses a dictionary of node -> bool, when a set works fine. https://github.com/msambol/dsa/blob/master/shortest_path/dijkstras.py#L21C1-L29C26 If this code initialized an empty visited set, the line of visited[node] = False in the graph iteration could be removed.

Do you want to do a PR for this?

This code uses an array which has O(n) access to check for visited, which makes the dfs algorithm O(n^2) instead of O(n) https://github.com/msambol/dsa/blob/master/search/depth_first_search.py#L18C1-L29C32 (There is a note in another part of the code that changes a list to a queue to improve time complexity, so I assume this change is in scope). https://github.com/msambol/dsa/blob/master/search/breadth_first_search.py#L19C1-L21C20

If you change it to a queue, you still need to iterate through the entire visited queue. The only operations on the array are append and pop, which are both O(1) [source]. I changed the BFS code to deque because append and popleft (not pop here) are both O(1), whereas popleft on an array is O(n). Make sense?

This code uses a list for visited, which is probably faster than the set in this case but is another representation of a set: https://github.com/msambol/dsa/blob/master/maximum_flow/ford_fulkerson.py#L21C1-L21C31

Either should be good here. The only operations on the array are get and set which are both O(1).

Thank you for looking at this!

Takashiidobe commented 5 months ago

I can make a PR for these changes (+ some others with some comments). We can discuss them more there and you can take what you'd like.

Takashiidobe commented 5 months ago

This code uses an array which has O(n) access to check for visited, which makes the dfs algorithm O(n^2) instead of O(n) https://github.com/msambol/dsa/blob/master/search/depth_first_search.py#L18C1-L29C32 (There is a note in another part of the code that changes a list to a queue to improve time complexity, so I assume this change is in scope). https://github.com/msambol/dsa/blob/master/search/breadth_first_search.py#L19C1-L21C20

If you change it to a queue, you still need to iterate through the entire visited queue. The only operations on the array are append and pop, which are both O(1) [source]. I changed the BFS code to deque because append and popleft (not pop here) are both O(1), whereas popleft on an array is O(n). Make sense?

I would change it to a set in this case, since the line if n not in visited where visited is a list is O(n) always, since it does a linear scan. If it was a set, it would be O(1). I wasn't using the bfs example as a direct 1:1 replacement, I was just noting that it was in scope to change a list to a queue to improve time complexity, so changing it from a list to a set to improve time complexity should also be in scope.

msambol commented 5 months ago

@Takashiidobe My fault, I misread that. I thought you were suggesting a queue. I'll take a look at the PRs 👍🏼

msambol commented 5 months ago

@Takashiidobe Thank you for improving these! Let me know if there is anything else.