kth-competitive-programming / kactl

KTH Algorithm Competition Template Library (... eller KTHs AC-tillverkande lapp)
2.68k stars 711 forks source link

Fix undefined behavior in Suffix Array #264

Closed ssk4988 closed 2 weeks ago

ssk4988 commented 1 month ago

Fixes undefined behavior in the Suffix Array implementation caused by copying a value from the end iterator of the string. I was able to get a runtime error on a problem and then get accepted by making this change to the suffix array implementation only.

lrvideckis commented 2 weeks ago

wow, ya'll actually are okay with having UB (knowingly) in your library?

that's terrible

ssk4988 commented 2 weeks ago

Oops, I accidentally closed this when deleting some branches

simonlindholm commented 2 weeks ago

wow, ya'll actually are okay with having UB (knowingly) in your library?

Not really, just lacking time for maintenance.

I'm surprised this caused issues in practice, I wonder what compiler/compiler flags were used. s[s.size()] is guaranteed to be 0, so it can't have been one that just defined char* end() { return &(*this)[size()]; }. The fix is a bit unfortunate because it triggers a reallocation making the code slightly slower. Maybe

        vi x(n), y(n), rank(n), ws(max(n, lim));
        sa = lcp = y; rep(i,0,n) sa[i] = i, x[i] = s[i];

would be better? It's slightly fewer chars and doesn't have the reallocation. Although, I guess if the allocations ever matter perf-wise there are better things you can do about them. Shrug. I'll just merge.

lrvideckis commented 2 weeks ago

Not really, just lacking time for maintenance

ah I take back what I said

ssk4988 commented 2 weeks ago

I wanted to add that this submission when viewed in coach mode on CodeForces and compared to its previous solution shows that the only significant change is the push_back and it stops a runtime error. I have another submission where the diff is exactly what is in this commit but I am unable to share it for some reason.

I'm surprised this caused issues in practice, I wonder what compiler/compiler flags were used. s[s.size()] is guaranteed to be 0, so it can't have been one that just defined char* end() { return &(*this)[size()]; }. The fix is a bit unfortunate because it triggers a reallocation making the code slightly slower. Maybe

      vi x(n), y(n), rank(n), ws(max(n, lim));
      sa = lcp = y; rep(i,0,n) sa[i] = i, x[i] = s[i];

would be better? It's slightly fewer chars and doesn't have the reallocation. Although, I guess if the allocations ever matter perf-wise there are better things you can do about them. Shrug. I'll just merge.

I hadn't even considered the allocation issue. Your proposed method seems fine.

simonlindholm commented 2 weeks ago

I wanted to add that this submission when viewed in coach mode on CodeForces and compared to its previous solution shows that the only significant change is the push_back and it stops a runtime error.

The issue with this submission is that string was changed to vector<int>, and vector<int> does not have the behavior of guaranteeing s[s.size()] == 0 that strings do because of nul termination. (This is the reason for the comment about basic_string<int> next to the constructor.) In order to make things work with vector<int> you additionally need to change the for(k && k--, j = sa[rank[i] - 1]; s[i + k] == s[j + k]; k++); loop not to read out of bounds, or else you're relying on luck.

That said, this is very subtle and I'm sympathetic to the cause of removing that footgun. Maybe we should do s.push_back(0) at the start instead? That probably also serves as a good reminder not to put other zeroes in the vector.