jeffreykegler / libmarpa

Marpa parse engine C library -- STABLE
MIT License
97 stars 11 forks source link

Something not consistent with the docs? #92

Closed dabrahams closed 2 years ago

dabrahams commented 2 years ago

This shows the state of a Valuator in my debugger:

image

Note that the current step type is 3 (MARPA_STEP_NULLING_SYMBOL) but t_ys_id is -1, which is what gets returned by marpa_v_es_id. However, the docs for that function say:

Return value: If the current step type is MARPA_STEP_RULE, the Earley Set ordinal where the rule ends. If the current step type is MARPA_STEP_TOKEN or MARPA_STEP_NULLING_SYMBOL, the Earley Set ordinal where the symbol ends. If the current step type is anything else, an unspecified value.

Presumably -1 is not a valid Earley set ordinal?

FYI, I got here by trying to follow the pattern of the trivial1.c test (albeit without the erroneous usages). The state is after the first call to marpa_v_step.

jeffreykegler commented 2 years ago

This looks like a real bug.

dabrahams commented 2 years ago

If you need to reproduce, clone SwiftMarpa, install Swift if necessary, and invoke “swift test” from the root directory. With a breakpoint here, LLDB will show you the values cited.

dabrahams commented 2 years ago

You may need a "${prefix}/lib/pkgconfig/libmarpa.pc"; mine is

prefix="/opt/homebrew/Cellar/libmarpa/pkg-config"
exec_prefix="${prefix}"
libdir="${prefix}/lib"
includedir="${prefix}/include"

Name: libmarpa
Description: C language library that implements the parse engine at the core of Marpa
URL: https://jeffreykegler.github.io/Marpa-web-site/libmarpa.html
Version: 8.6.2
Cflags: -I"${includedir}"
Libs: -L"${libdir}" -lmarpa
jeffreykegler commented 2 years ago

I may proceed by re-implementing your test in CMake. I'm not inclined to try to debug the problem from inside Swift -- it's a totally new environment to me and, anyway, if the problem is going to be fixed, it has to be duplicate-able in C.

A (currently) pedantic point. I notice you get the values by accessing a structure, instead of calling the macros. This is not supported. I don't think it has anything to do with your problem, but in a future libmarpa, it may not work.

dabrahams commented 2 years ago

Presumably it would be worth your time to reduce the test a bit before you try to replicate it…but suit yourself.

🤷 Swift doesn't import function-style macros. I suppose I could have created a C shim that used the macros and re-exposed them as functions, and maybe I will do that eventually… but with macros like that any binary is tied to that version of marpa and would at least need to be recompiled to keep working if you changed the structure.

dabrahams commented 2 years ago

Actually, it should be sufficient to check the state of v right after this line in your existing test.

jeffreykegler commented 2 years ago

The macros are documented as only valid after a call to marpa_v_step():

marpa_v_step() returns the type of step. Most step types have values associated with them. To access these values use the methods described in the section Basic step accessors

At the point where the problem occurs, has there been a call to marpa_v_step()?

dabrahams commented 2 years ago

Yes, as noted in the last sentence of the issue description.

jeffreykegler commented 2 years ago

@dabrahams Actually, re your comment before last, I was checking that line, if not in the most graceful way. I cleaned it up in the commit just referenced, and it presents two mysteries:

First, it does not duplicate your bug.

Second, even so, it is not clear it is right. Shouldn't there be a MARPA_STEP_NULLING_SYMBOL step? I set up the test so that the actual behavior, a MARPA_STEP_INACTIVE step, is treated as correct, but I'm not sure it should be.

I still am feeling the lingering effects of that bug, so I will leave this until the AM.

dabrahams commented 2 years ago

I was checking that line

Specifically, were you checking the result of marpa_v_es_id(v) and after that line has executed?

I still am feeling the lingering effects of that bug, so I will leave this until the AM

Indeed, buggy coders make for buggy code. Hope you feel better!

jeffreykegler commented 2 years ago

The value of marpa_v_es_id(v) is indeterminate when the step type code is MARPA_STEP_INACTIVE. I inserted a printf after the marpa_v_step() line is executed, and FWIW it shows that the value is -1.

My plan is next to determine what libmarpa should be doing when stepping through a trivial parse. (By the way, trivial parses are special-cased -- I found that doing so makes for simpler and safer code, my present embarrassment notwithstanding.) I will want to do a bit of checking, so it looks like I was over-hasty before.

If libmarpa is not behaving correctly for trivial parses (which I suspect I will find is the case), the next step is to fix it. This should not be hard, due to the special-casing.

At that point we check for your issue again, to see if it is cleared up. If so, we declare victory and leave. If not, we proceed from there.

jeffreykegler commented 2 years ago

Progress is slow, because my personal bug is lingering, but I have some insight.

I note that my "trivial1" test does not call marpa_g_forced_value(). Your Swift code does force "valued" status, which is a good idea and as per the API's recommendation. This difference would account for my previous failure to duplicate the bug.

My next step will be to add a test to the libmarpa test suite which duplicates this bug. I expect things will be straight-forward from there.

jeffreykegler commented 2 years ago

The commit last referenced duplicates the bug.

jeffreykegler commented 2 years ago

I have written up a new version of the documentation of the libmarpa ES location accessors, clarifying their expected behavior.

jeffreykegler commented 2 years ago

The expected ES location values of the nulled top node in a trivial parse are marpa_v_es_id (v) == 0 and marpa_v_token_start_es_id (v) == 0, where v is a valuator whose current step is that nulled top node.

This is NOT the current behavor of libmarpa. My plan:

  1. Do some checks, and clean ups.
  2. Make the fix in https://github.com/jeffreykegler/libmarpa/blob/d4308f763565dbcd34cd6a337bfa547a599e35a1/work/dev/marpa.w .
  3. Change the tests to check for the new, corrected behavior, and retest.
  4. Do more checks, cleanups, documentation, etc.
jeffreykegler commented 2 years ago

With https://github.com/jeffreykegler/libmarpa/commit/7686e5361bb103192124beabba430b875d2510e3 I have finished all but the last step in the plan outlined above. The tasks accomplished include the fix itself and its libmarpa tests.

The one remaining task is to test this fix with Marpa::R2. Before I mark this issue "About to be closed", I will do this.

jeffreykegler commented 2 years ago

The fix tests OK in Marpa::R2 without any revisions. (Marpa::R2 special cases trivial parses, a practice I recommend for all wrappers.) I am marking this issue as read to be closed.

dabrahams commented 2 years ago

Do you have a link to details of what you mean by that recommendation?

jeffreykegler commented 2 years ago

Well, there is the Marpa::R2 code, but that makes a simple recommendation complicated. The idea is that the result of trivial parse is a constant, since the input is.

So the wrapper should check first if the input is zero length. Then it should check with marpa_g_symbol_is_nullable that the start symbol is nullable. If not, it's a parse failure.

In the case of a successful parse of the zero length input, the result will be a constant, predictable in advance unless you're doing something wierd. Wierdness or no wierdness, Earley's algorithm has nothing to say that will help you in dealing with it.

If you'd like to a link to the Marpa::R2, let me know. Also, the IRC channel may contain people who have dealt with this. In particular, @jddurand, who usually is lurking, has probably dealt with this in some of his repos.

jeffreykegler commented 2 years ago

Here's where I special-case the semantics of nullable grammars in Marpa::R2: https://github.com/jeffreykegler/Marpa--R2/blob/2ef61bdf4d0ff2de3503ca136dbe4a18ff742f9b/cpan/lib/Marpa/R2/Value.pm#L1306 Now that I look at it, it may be the rest of the code handles zero-length inputs with the main code -- that's the only special case-ing of nullable grammars I can find.

dabrahams commented 2 years ago

What's the point in spending optimization effort on this edge case that will be very fast to evaluate anyway? Or am I failing to understand the scenario?

jeffreykegler commented 2 years ago

The point is not optimization (AFAICT special casing is no faster and unless the optimizer is very very smart will cause a tiny bit of code bloat). It's just that I have found that if you exclude zero-length parses from the main logic flow, you wind up with cleaner code. YMMV, as they say.

As an example, consider in marpa.w folding the code of

https://github.com/jeffreykegler/libmarpa/blob/7686e5361bb103192124beabba430b875d2510e3/work/dev/marpa.w#L12997

into

https://github.com/jeffreykegler/libmarpa/blob/7686e5361bb103192124beabba430b875d2510e3/work/dev/marpa.w#L12937

My experience/belief is that, yes, this replaces two flows of logic with one, but the ease of reading and debugging is better with the two logic flows. The second one is hardly what you'd call clean, but I'd say that all the more reason to be desperate to avoid complicating it further. The first one (for the trivial parse) can make quite a few helpful assumptions not valid in the general case, and keeping it separate means that pulling out the behavior for the trivial case is simple. If the two logics were merged it would be quite the task to figure out what happens in the trivial case and (as I have just had to) to change it without breaking something else.

From several experiences like that I've generalized to the principle "special case the trivial parse". A programmer with different experiences, and/or different talents, might reach a different conclusion.

jeffreykegler commented 2 years ago

Closed for the reason given in https://github.com/jeffreykegler/libmarpa/issues/92#issuecomment-1097289497.