Verify results - Githubissues

snadi commented 3 months ago

Check latest run.

First, seems networkx examples currently do not produce warnings on original lib version, which is strange.

All pandas examples were not fixed which is also strange.

snadi commented 3 months ago

Here are the observations from verifying the results:

The networkx examples do produce errors on the original code so that is not a problem. However, upon investigation, it seems that the LLM is now producing responses that do not follow the expected structure (in unexpected ways). For example, here is a response

1. \```The full updated code snippet in a fenced code block\```
\```python
import networkx as nx
import numpy as np

A = np.array([[0, 1, 1, 0, 0], [1, 0, 1, 1, 0], [1, 1, 0, 1, 1], [0, 1, 1, 0, 1], [0, 0, 1, 1, 0]])
G = nx.from_numpy_array(A)
print(G.edges)
\```

3. The `nx.from_numpy_matrix` function is replaced with `nx.from_numpy_array`.
4. No references used

We extracted the first fenced code snippet as the snippet produced by the LLM, which is of course wrong.

For pandas, running all the original and updated examples fail with the following error ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
It's actually numpy whose original examples pass.
The model sometimes replies back with only No references used, which we then consider as "NO_RESPONSE"
The model sometimes provides the code snippet even though it is not updated
```python import numpy as np from scipy.optimize import minimize

def rosen(x): """The Rosenbrock function""" return sum(100.0*(x[1:]-x[:-1]2.0)2.0 + (1-x[:-1])**2.0)

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2]) res = minimize(rosen, x0, method='TNC', options={'maxiter': 10})

print(res.x) ```

No updates needed.
No references used

snadi commented 3 months ago

Fixed with PR #9

snadi / UpgrAIder

Verify results #6