[ ] My favorite thing to ask the models designed for programming is: "Using Python w... | Hacker News

My favorite thing to ask the models designed for programming is: "Using Python w... | Hacker News

Snippet

My favorite thing to ask the models designed for programming is: "Using Python write a pure ASGI middleware that intercepts the request body, response headers, and response body, stores that information in a dict, and then JSON encodes it to be sent to an external program using a function called transmit." None of them ever get it right :)

Full Content

My favorite thing to ask the models designed for programming is: "Using Python write a pure ASGI middleware that intercepts the request body, response headers, and response body, stores that information in a dict, and then JSON encodes it to be sent to an external program using a function called transmit." None of them ever get it right :)

I normally ask about building a multi-tenant system using async SQLAlchemy 2 ORM where some tables are shared between tenants in a global PostgreSQL schema and some are in a per-tenant schema. Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

Since then benchmarks came out showing that ChatGPT "didn't really degrade", but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.

It's been months since I've had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.

irthomasthomas / undecidability