Closed mschoch closed 7 years ago
What is the preferred style of addressing review comments for this project? Would you like separate commits that can be squashed later? Or force push into single commit now?
fixup commits is probably most helpful.
Addressed some of the initial problems. Next I'll be making the larger change to support signed integers throughout.
I've now updated the implementation to support signed integers (both as a datatype, and internally for tracking cursor/limit/etc as is done in the C impl)
I identified one example program where the cursor can be set negative, and I updated the runtime to behave like the C version (it's not clear if this is well specified or not).
The example I considered is this:
integers ( p1 )
externals ( stem )
define stem as (
backwards(
tomark -2
= 'bobo'
)
)
There may be other operations that need to be updated to behave correctly if the cursor becomes negative (or perhaps this is just considered undefined or poor snowball programming).
Or perhaps better, use constants MaxInt32
and MinInt32
from the math
package instead: https://golang.org/pkg/math/#pkg-constants
Snowball has the concept of undefined operations (e.g. The effect of obeying among
when the preceding substring
is not obeyed is undefined), but the manual defines tomark
as tomark AE
moves c
forward to the position given by AE
, and when AE
is negative we can't move c
forward to it so that should probably be a no-op.
That's also how the C backend (which is essentially the reference implementation as it was the original backend) seems to handle it.
Oh, I see you mean in backwards
mode when trying to position c
before the start of the string. I think that should signal f
- In the case of tomark AE
, a similar fail condition occurs as with hop AE
. If c
is already beyond AE
, or if position l
is before position AE
, the signal is f
. and String commands have been described with c
to the left of l
and moving right. But the process can be reversed. backwards C
c
and l
are swapped over, and c
moves left towards l
Sure, I can use the constants in the math package, but then it complicates the generation because they won't always be imported, and imported-but-not-used packages are an error in Go.
I already avoid conditionally needing the utf8 package by wrapping the method I need in the runtime package. I could do the same for the math symbols. But you seem have something really specific you want.
Can you explain when you mean when you say, "I'm not aware of a valid way to make the cursor or limit negative"?
In this program:
integers ( p1 )
externals ( stem )
define stem as (
backwards(
tomark -2
= 'bobo'
)
)
My undrstanding is that tomark
sets the cursor to -2.
When I look at the C code I see the line:
z->c = -2; /* tomark, line 7 */
z->c = -2; /* tomark, line 7 */
That looks like a bug in the C backend to me - thanks for pointing it out. There's a gating check generated, but what it checks doesn't match what the docs say:
if (z->c < -2) return 0;
z->c = -2; /* tomark, line 7 */
It's more clearly wrong when not in backwards mode:
if (z->c > 1000000) return 0;
z->c = 1000000; /* tomark, line 10 */
That checks if c
is already beyond AE
, but there's no check for if position l
is before position AE
.
@ojwb thanks for the info on tomark and hop. So, after re-reading the spec, at a high level, both hop and tomark should have checks ensuring that the new cursor position is valid. And I think if we get this part right, my concerns about signed integers affecting cursor/limit locations should go away.
I spent some more time looking at hop, the specification and the generated C code.
First, the spec says "moves c AE character positions towards l, but if AE is negative, or if there are less than AE characters between c and l the signal is f"
Sample program:
externals ( stem )
define stem as (
hop 2
hop -1
= 'b'
)
On the input string 'beautiful', this produces 'bb'.
So, first it seems the negative AE is allowed, despite what the spec says.
It does move c "closer" to l by one character, when you realize that closer means farther in the negative direction.
Looking at the generated C code, it appears that if AE is positive, you can't exceed l. And if AE is negative, you can't go less than 0.
Do you agree with the C implementation here? It seems to be safe and generally useful, and the spec just needs slightly different wording.
I believe I've addressed the issues you've raised so far, and I'm committed to continue working on the tomark
and hop
issues we've identified, once we agree on what the C version should do.
I've also added a travis CI job as well.
Are there any remaining barriers to getting this merged?
Thanks for hooking up the CI.
I don't think there are any blockers - I just need to actually sit down and give it one last check over, then merge assuming it looks OK.
Extending the spec to allow hop
with a negative argument indeed seems reasonable.
initial implementation ported from rust generator primarily focused on getting functionality working not on the best or most performant Go code
seems to work, based on: make check_go and other ad-hoc testing