karpathy / micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
MIT License
10.51k stars 1.52k forks source link

Topological sort - bug #67

Open gordicaleksa opened 7 months ago

gordicaleksa commented 7 months ago

It's a nit that won't matter most of the time but the topo sort implementation doesn't work in case you have cycles in the graph.

i.e. there is a hard assumption you're operating over a DAG.

Narasimhareddy-B commented 6 months ago

A directed acyclic graph is a directed graph that has no cycles. A vertex v of a directed graph is said to be reachable from another vertex u when there exists a path that starts at u and ends at v. As a special case, every vertex is considered to be reachable from itself (by a path with zero edges).

Jet-lag commented 4 months ago

It's a nit that won't matter most of the time but the topo sort implementation doesn't work in case you have cycles in the graph.

i.e. there is a hard assumption you're operating over a DAG.

It's correct. On the other hand, if it is a Dag, can we simply write the backward function as

def backward(self):  
    self._backward()  
    for child in self._prev:  
      child.backward()
pkulijing commented 4 months ago

It's a nit that won't matter most of the time but the topo sort implementation doesn't work in case you have cycles in the graph. i.e. there is a hard assumption you're operating over a DAG.

It's correct. On the other hand, if it is a Dag, can we simply write the backward function as

def backward(self):  
    self._backward()  
    for child in self._prev:  
      child.backward()

No. Your implementation is essentially a DFS, while topological sort requires a BFS. For this simple case your code could be wrong:

b = 2*a
c = a + b

with topological sort, it's guaranteed that the back propagation goes in the order of c -> b -> a. With your code, the order could be c -> a -> b, which is wrong.

conscell commented 1 month ago

@gordicaleksa Could you please provide an example when the computational graph is not a DAG?