Closed rlrs closed 2 days ago
Thanks a lot - will take a look!!
@rlrs Thanks a lot! This definitely is much better than my previous impl :) I noticed that adding substring = [int(x) for x in substring if x.isdigit()]
would have solved the issue, but your impl definitely doesn't need to perform in the "string" space (ie via the longest common substring).
I left the old impl just in case I might have to refer back to it!
This should fix https://github.com/unslothai/unsloth/issues/1015
The issue was that the existing LCS implementation would find a leading comma, leading it to call
int('')
. Since I was performing this fix anyways, I took the liberty of implementing it a bit cleaner. Instead of finding the LCS of a string repr of a list of tokens, we just find the least common sublist of tokens directly. The string/int conversions are not necessary anymore.I also added a very simple sanity test at the bottom of the file.