snadi / UpgrAIder

Upgrade deprecated/outdated code using LLMs and release notes
MIT License
1 stars 0 forks source link

Verify results #6

Closed snadi closed 3 months ago

snadi commented 3 months ago

Check latest run.

First, seems networkx examples currently do not produce warnings on original lib version, which is strange.

All pandas examples were not fixed which is also strange.

snadi commented 3 months ago

Here are the observations from verifying the results:

  1. The networkx examples do produce errors on the original code so that is not a problem. However, upon investigation, it seems that the LLM is now producing responses that do not follow the expected structure (in unexpected ways). For example, here is a response
1. \```The full updated code snippet in a fenced code block\```
\```python
import networkx as nx
import numpy as np

A = np.array([[0, 1, 1, 0, 0], [1, 0, 1, 1, 0], [1, 1, 0, 1, 1], [0, 1, 1, 0, 1], [0, 0, 1, 1, 0]])
G = nx.from_numpy_array(A)
print(G.edges)
\```

3. The `nx.from_numpy_matrix` function is replaced with `nx.from_numpy_array`.
4. No references used

We extracted the first fenced code snippet as the snippet produced by the LLM, which is of course wrong.

  1. For pandas, running all the original and updated examples fail with the following error ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

  2. It's actually numpy whose original examples pass.

  3. The model sometimes replies back with only No references used, which we then consider as "NO_RESPONSE"

  4. The model sometimes provides the code snippet even though it is not updated

  5. ```python import numpy as np from scipy.optimize import minimize

def rosen(x): """The Rosenbrock function""" return sum(100.0*(x[1:]-x[:-1]2.0)2.0 + (1-x[:-1])**2.0)

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2]) res = minimize(rosen, x0, method='TNC', options={'maxiter': 10})

print(res.x) ```

  1. No updates needed.
  2. No references used
snadi commented 3 months ago

Fixed with PR #9