Challenges of Large Language Models

Open Source Unfortunately most llama based and other free models fail to work with the tools defined by langchain. It works for single functions but already the current complexity of langsim they struggle.

ChatGPT

ChatGPT 3.5 turbo can execute the calculation of one nobel metal but fails to execute a loop over all nobel metals. It seems like the abstract structure of a loop which is implicitly defined is not clear to ChatGPT 3.
ChatGPT 4 works fine with one state available in branch working_with_chatgpt4 but fails with the current main branch with an JSONDecodeError.
ChatGPT 4o works fine with the latest changes - in particular the state in branch working_with_chatgpt4o. The interesting part is when it comes to the implicit loop ChatGPT 4 executes the steps (generate the crystal structure, equilibrate it and calculate the bulk modulus) for one element out of the nobel metals and then moves to the next, in contrast Chat GPT 4o first executes the first step of generating the crystal structure for all elements, then equilibrates all resulting structures and finally calculates the bulk modulus for all equilibrated structures.

The behaviour seems to be somewhat reproducible so I wanted to quickly summarise it here.

jan-janssen / LangSim

Challenges of Large Language Models #48