mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.01k stars 117 forks source link

Cleanup some string handling #2460

Closed Skgland closed 1 month ago

Skgland commented 2 months ago
mthom commented 1 month ago

@lucksus may want to weigh in on the lib_machine.rs changes since they'll likely impact his work. Looks good to me, however

Skgland commented 1 month ago

Turning back into a draft as 5bfcb63 can be improved by using str::char_indices instead of manually keeping track of the current end position.

bakaq commented 1 month ago

I just noticed that you are doing here some of the things that I was experimenting with.

Skgland commented 1 month ago

I have made the changes noted in https://github.com/mthom/scryer-prolog/pull/2460#issuecomment-2275749823. Additionally I have adjusted parsing of QueryResults to handler more results properly (with added tests). The resulting json has also been adjusten in places to fix edge cases that would produce invalid json. Also previously prolog strings containing valid json would be turned into the contained json when serializing the Value to json instead they are kept as the strings.

Skgland commented 1 month ago

I just noticed that you are doing here some of the things that I was experimenting with.

Oh, any part of this specifically? Any notes? A review would be appreciated.

bakaq commented 1 month ago

I noticed that in the current implementation Value can't actually represent variables properly, and the From<String> implementation struggles with even composite terms. I was rewriting that part (which you also seem to have improved here), but I think I will just do an actual heap walk instead of parsing a string, because it will be more resilient to operators and such.

bakaq commented 1 month ago

I was actually trying to implement the JSON encoding that I described in https://github.com/mthom/scryer-prolog/pull/2465#issuecomment-2276526193, and for that I had to deal with a lot of the stuff you improved here.

Skgland commented 1 month ago

I noticed that in the current implementation Value can't actually represent variables properly, and the From<String> implementation struggles with even composite terms. I was rewriting that part (which you also seem to have improved here), but I think I will just do an actual heap walk instead of parsing a string, because it will be more resilient to operators and such.

Yeah, that sounds like a good idea to not take the detour via a string. I also noticed that for {} and <<>> structures Value currently just ignores the keys which is probably undesirable.

Skgland commented 1 month ago

I was actually trying to implement the JSON encoding that I described in #2465 (comment), and for that I had to deal with a lot of the stuff you improved here.

I noticed you syntax sugar in https://github.com/mthom/scryer-prolog/pull/2465#issuecomment-2276537915 is still terser than what I think is necessary.

Couldn't numbers, list and strings just be that and only things that are encoded as strings, but aren't actual strings be encoded as an object? Also I would call ,-list tuple i.e.

{
    "functor": ",",
    "args": [
        {
            "functor": "=",
            "args": [
                { "variable": "X" },
                1
            ]
        },
        {
            "functor": "=",
            "args": [
                { "variable": "Y" },
                2
            ]
        }
    ]
}
// asdf
// You could encode an atom as just a functor with no arguments, but this is cleaner.
{ "atom": "asdf" }

// (1, 2, 3)
// Canonically you would encode this as ','(1,','(2,3)) which is kind of weird to use.
{ "tuple": [1, 2, 3] }

// [1,2,3]
// Canonically you would encode this as '.'(1,'.'(2,'.'(3,[]))).
[1, 2, 3]

// "asdf"
// Canonically you would encode this as '.'(a,'.'(s,'.'(d,'.'(f,[])))).
"asdf"
Skgland commented 1 month ago

Maybe the atoms true and false could also be a special case and turn into the json literals true and false?

bakaq commented 1 month ago

Couldn't numbers, list and strings just be that and only things that are encoded as strings, but aren't actual strings be encoded as an object? Also I would call ,-list tuple i.e.

Yes, I thought about that since then. Having everything be an object is more homogeneous, but it's probably a good idea to use the more intuitive representations. "tuple" is a good name, but maybe "conjunction" is even better because that is what a ",-list" usually means. There could also be syntax sugar for disjunctions (lists with ;), but I think this is way less useful.

Skgland commented 1 month ago

Couldn't numbers, list and strings just be that and only things that are encoded as strings, but aren't actual strings be encoded as an object? Also I would call ,-list tuple i.e.

Yes, I thought about that since then. Having everything be an object is more homogeneous, but it's probably a good idea to use the more intuitive representations. "tuple" is a good name, but maybe "conjunction" is even better because that is what a ",-list" usually means. There could also be syntax sugar for disjunctions (lists with ;), but I think this is way less useful.

Yeah, I just checked and apparently using - for pairs/tuples i.e. X-Y, X-Y-Z is more common to not cause confusion with conjunctions. At least https://swi-prolog.discourse.group/t/prolog-convention-of-pairs-and-tuples/5385 brought me to https://www.metalevel.at/prolog/data#pair

Skgland commented 1 month ago

I think I will hold of with rebasing this for now. This appears to be mostly superseded by #2475.

Either way I think rebasing this onto that should be easier than the other way round, do it should come first

bakaq commented 1 month ago

I haven't actually improved any JSON serialization code in #2475, but those changes make it so that parsed_results::Value actually faithfully represents Prolog terms (apart from differentiating anonymous variables, which I'm still working on). This is necessary to do before actually improving the JSON serialization to something like https://github.com/mthom/scryer-prolog/pull/2465#issuecomment-2276526193, which I do pretend to do after finalizing #2475.

Skgland commented 1 month ago

Ah, my bad, I thought that was already part of #2475, but still I don't think it makes to continue with this at the moment with #2475 making the parsing obsolete and the changes expected after that to adjust the JSON serialisation.

If we still want to keep the string parsing around after those. Then would be the time to revise/rebase this and strip it down to just the parsing improvements.

Skgland commented 1 month ago

The string escaping logic for JSON serialisation added by this might also still be useful either way.

bakaq commented 1 month ago

If we still want to keep the string parsing around after those.

I really think this will not be necessary, parsing the strings is much more brittle than walking the heap.

The string escaping logic for JSON serialisation added by this might also still be useful either way.

Definitely! I would probably have just ignored that in my first pass, but it's really important.

Skgland commented 1 month ago

Also the test in tests/scryer/proptests.rs is probably also useful. It ensures that we emit valid JSON via a "round-trip" test. It generates random Values using proptest then serializing the value to JSON and using serde_json to deserialize it again (into a serde_json::Value).

The other added test are probably not as useful as they were added to exercise edge-cases in the string parsing that I would expect to be less of a problem/special-case for the heap walking.

bakaq commented 1 month ago

I just noticed reading your tests that I may be dealing with [] wrong. I represent it as an atom, but it's probably better to deal with it as an empty list.

I think maybe the best to do here is for you to do an in-place rebase to remove any changes you are doing here to string parsing, and keep just the serialization part. We can then build upon that for the improved JSON serialization (I'm thinking of going full serde here because that would be very nice to have). Just crate another branch based on this one as it is now before doing all that so that you don't lose the things that you have done here that may be useful later anyway.

As for having string term parsing anyway, I think this is better delegated to Scryer itself. We could still implement FromStr for parsed_results::Value, but using the actual interface we already have. Something like:

impl FromStr for Value {
    type Err = ();
    fn from_str(string: &str) -> Result<Self, Self::Err> {
        let mut machine = Machine::new_lib();
        let result = machine.run_query(format!("X = {}.", string)).map_err(|_| ())?;
        match result {
            QueryResolution::Matches(matches) => Ok(matches.bindings["X"]),
            _ => Err(()),
        }
    }
}

This has the benefit that it uses exactly the same logic of Scryer. Even better would be to give a parse_term(&mut self, string: impl Into<String>) method to Machine that would use that one instead of creating a whole new one, and that would mean that if we defined any operators in that machine earlier, then they will make sense (which is kind of impossible to do manually without just reimplementing all the internal Scryer logic).

Skgland commented 1 month ago

I have opened #2489 with only the serialization improvements but without the parsing changes.

Skgland commented 1 month ago

I closing this, as with #2475 we don't need the string parsing anymore and the serialization changes have been split of into #2489.

The parsing edge-case tests might still be interesting, but I don't think they will be very useful after string parsing is no longer used.

https://github.com/mthom/scryer-prolog/blob/509fa7c5b642c2abd4a5d4e539153ef8ff57eb41/src/machine/lib_machine.rs#L634-L710