Port Simon Forman's Thun Interpreter

mthom commented 4 years ago

Now that char_type is implemented, I'd like to see Simon Forman's Thun interpreter ported to Scryer Prolog:

https://osdn.net/projects/joypy/scm/hg/Joypy/blobs/tip/thun/thun.pl

The interpreter is wonderfully demonstrative of Prolog's strengths. But, it needs a few small changes in order to conform to the ISO standard. Also, you'll notice from the source that it uses a few less-than-common UTF-8 characters that Scryer's parser does not yet recognize. Ideally, char_type should be extended to recognize them as well.

ghost commented 4 years ago

@triska , this could be the next challenge but I started to learn Prolog this month. And I'm also trying to understand the engine (reading Warren's Abstract Machine: A Tutorial Reconstruction).

triska commented 4 years ago

@notoria: Thank you for looking into it, I would greatly enjoy working on this with you.

I have filed #394 as one of the prerequisites for this project.

matt2xu commented 4 years ago

I started looking into it, per https://github.com/mthom/scryer-prolog/issues/401#issuecomment-619933630

I downloaded the thun.pl from the description and indeed it does not work :smile: First problem: the file has a Byte Order Mark (FE FF), so it is presumably UTF-16 (Big Endian) :thinking: :confused: Removing the BOM allowed me to go a bit further, hitting a "syntax_error(non_prolog_character)" on line 2, which I suspect is caused by the nice/weird "ANSI Shadow" characters.

Both issues are in the prolog_parser. Questions:

what encoding is/are/should be supported? Only UTF-8? Should we throw a perhaps clearer error when encountering a BOM?
in a block comment ("bracketed comment") why not ignore everything until chars "*" and "/" are encountered? Does the standard require only "Prolog characters" be present?

pmoura commented 4 years ago

I took a look at this Prolog codebase a couple of days ago. My usual approach is to use the Logtalk linter to check for potential porting issues. In this case, used SWI-Prolog as the Logtalk backend compiler (which was the compiler used for the original code) and the following steps:

?- {hook_objects(loader)}.
 ...
true.

?- logtalk_compile('thun.pl', [hook(object_wrapper_hook)]).
...

This resulted in three compilation errors. The fixes were:

@@ -44,7 +44,7 @@
  */

 :- use_module(library(clpfd)).
-:- use_module(library(dcg/basics)).
+:- use_module(library(dcg/basics), except([eos//0])).
 :- dynamic func/3.
 :- dynamic def/2.

@@ -439,7 +439,7 @@
 lines(Codes, [Line|Lines]) :- append(Line, [0'\n|Rest], Codes), !, lines(Rest, Lines).
 lines(Codes, [Codes]).

-:- assert_defs("defs.txt").
+:- initialization(assert_defs("defs.txt")).

 % A meta function that finds the names of all available functions.
@@ -1024,7 +1024,7 @@

 % Simple DCGs to expand/contract definitions.

-expando,    Body --> [Def], {def(Def, Body)}.
+expando,    [Head|Body] --> [Def], {def(Def, [Head|Body])}.

The first fix was required due to eos//0 being a built-in non-terminal in Logtalk. Should not be required for Scryer Prolog. The second fix is standards compliance fix, solved using an initialization/1 directive instead of an arbitrary goal as a directive. This one is likely required for Scryer Prolog. The third is the use of a variable in the place of a push-back list. SWI-Prolog accepts it. Logtalk and all other Prolog systems that I tried throw an instantiation error. Possibly the same issue with Scryer Prolog. Note that my quick fix may not be correct/complete. After these fixes, the thun.pl file compiles with only warnings. The relevant ones are:

Missing directives (assuming SWI-Prolog libraries):

:- use_module(library(lists), [append/3, list_to_set /2, member/2, select/3]).
:- use_module(library(apply), [maplist/2]).
:- use_module(library(assoc), [assoc_to_list/2, del_assoc/4, empty_assoc/1, put_assoc/4]).
:- use_module(library(readutils), [read_file_to_codes/3]).
:- use_module(library(listing), [portray_clause/1]).

You may prefer to use instead use_module/1 directives. I used use_module/2 directives above to list the actual library predicates being called. For each missing predicate, the Logtalk linter prints a warning like:

*     Unknown predicate called but not defined: maplist/2
*       while compiling object thun
*       in file /Users/pmoura/Downloads/thun.pl between lines 885-890

There are four more relevant warnings, all easy to fix:

*     Singleton variables: [List,Item]
*       while compiling object thun
*       in file /Users/pmoura/Downloads/thun.pl between lines 859-862
*          
*     Suspicious call: A=..[func,B,C,D] instead of A=func(B,C,D)
*       while compiling object thun
*       in file /Users/pmoura/Downloads/thun.pl between lines 553-557
*     
*     Missing meta_non_terminal directive for non-terminal: rebo//2
*       while compiling object thun
*       in file /Users/pmoura/Downloads/thun.pl at or above line 1037
*     
*     Missing predicate directive: :-dynamic combo/5.
*       while compiling object thun
*       in file /Users/pmoura/Downloads/thun.pl between lines 447-452

Hope this helps.

mthom commented 4 years ago

@matt2xu:

I started looking into it, per #401 (comment)

I downloaded the thun.pl from the description and indeed it does not work smile First problem: the file has a Byte Order Mark (FE FF), so it is presumably UTF-16 (Big Endian) thinking confused Removing the BOM allowed me to go a bit further, hitting a "syntax_error(non_prolog_character)" on line 2, which I suspect is caused by the nice/weird "ANSI Shadow" characters.

Both issues are in the prolog_parser. Questions:
1. what encoding is/are/should be supported? Only UTF-8? Should we throw a perhaps clearer error when encountering a BOM?
Yes, only UTF-8 is supported. An error for BOM would be appropriate.
2. in a block comment ("bracketed comment") why not ignore everything until chars "*" and "/" are encountered? Does the standard require only "Prolog characters" be present?

Sort of, yes. According to the ISO standard, each char must be a "prolog_char", which is the union of the graphic, alphanumeric, solo, layout, and meta char classes. Therefore, to include further characters in the "prolog_char" class, we must categorize each of the "shadow chars" as belonging to at least one of the five.

Are you a fan of the British band Wire?

@pmoura: Thanks, that's helpful!

matt2xu commented 4 years ago

Sort of, yes. According to the ISO standard, each char must be a "prolog_char", which is the union of the graphic, alphanumeric, solo, layout, and meta char classes. Therefore, to include further characters in the "prolog_char" class, we must categorize each of the "shadow chars" as belonging to at least one of the five.

@mthom Given that the rule for single line comment already skips any character, I've done the same thing for multiline comments :)

I've added some integration tests, making the lexer module public so I can call it without having to create a Parser. I'll solve the BOM thing next!

Are you a fan of the British band Wire?

I don't know them, I'll listen :)

@pmoura: thanks for the logs, this will help!

ghost commented 4 years ago

Does this mean a Prolog code can contain binary data in comment?

mthom commented 4 years ago

Given that the rule for single line comment already skips any character, I've done the same thing for multiline comments :)

OK, that's fine. :+1:

I don't know them, I'll listen :)

Your github handle is very similar to one of their early songs, I was just curious. :)

Does this mean a Prolog code can contain binary data in comment?

The lexer deals in Rust's char's. char's are 4 bytes in size and map onto Unicode scalar values. I'm not sure if they somehow include binary data.

matt2xu commented 4 years ago

Does this mean a Prolog code can contain binary data in comment?

What do you mean by "binary data"? UTF-8 specifies an encoding of characters where a character can be encoded as 1 byte if its value is less than 128 (so ASCII 7 bits), or up to 4 bits otherwise. In this encoding some values are illegal/invalid, for instance a 2-byte sequence 110xxxx 0xxxxxxx, or FF or FE, etc. In other words, not all binary data is legal UTF-8. That said, you could very well encode binary data, let's say a JPEG image for instance, by using some form of conversion to make it legal UTF-8. The same principle is applied on the Web when you embed images in CSS using base64 encoding, but instead here you would use (checks Wikipedia) base 1,112,064 encoding.

ghost commented 4 years ago

I meant arbitrary data like a raw JPEG image in the comment.

matt2xu commented 4 years ago

PR on prolog_parser to handle any char in multiline comment, check invalid BOM, and skip valid UTF-8 BOM: https://github.com/mthom/prolog_parser/pull/5

matt2xu commented 4 years ago

After fixing the imports I had an issue caught: error(syntax_error(incomplete_reduction),use_module/1:53) which I tracked down, by commenting % :- dynamic func/3. and % :- dynamic def/2., to come from line 564: rule(Head, [A|B], Head :- maplist(call, [A|B])).

So:

what is an "incomplete_reduction"?
why does it show at the wrong place?
why is this particular line causing an "incomplete_reduction" error?

This is only the beginning I think, then the initialization uses read_file_to_codes to apparently read a file into "a list of character codes"? :thinking: What is or should be the equivalent in Scryer Prolog? I saw that phrase_from_file was added recently, and also the README says that streams are a work in progress.

triska commented 4 years ago

dynamic is not a (standard) operator, please write the directive in functional notation, i.e.:

:- dynamic(def/2).

triska commented 4 years ago

phrase_from_file/2 is the best way to read from a file, and would be the preferred method even if streams were already available.

Using phrase_from_file/2, a good way to rewrite assert_defs/1 is:

assert_defs(DefsFile) :-
    phrase_from_file(lines(Lines), DefsFile),
    maplist(phrase(joy_def), Lines).

where lines//1 should be a DCG that can be applied to the file.

The key advantages of this is that:

the DCG can be tested also without any file present
phrase_from_file/2 can be implemented very efficiently, using lazy loading of only the required parts of the file.

Scryer uses characters (one-char atoms) instead of codes: Characters are much easier to read in answers, and also easier to write in programs.

triska commented 4 years ago

Regarding the "incomplete reduction" error:

6.3.3.1 Arguments

An argument (represented by arg in the syntax rules)
occurs as the argument of a compound term or element of
a list. It can be an atom which is an operator, or a term
with priority not greater than 999. When an argument is
an arbitrary term, its priority shall be less than the priority

So, for example, instead of:

rule(., ., a :- b).

write:

rule(., ., (a :- b)).

This is a good example where lax enforcement of ISO syntax leads to non-portable programs. One of the major attractions of Scryer Prolog is its strong adherence to the standard, allowing us to write more portable applications.

matt2xu commented 4 years ago

I have started to port thun.pl, but it seems to me to be a bigger issue than initially expected. This requires good knowledge of Prolog, which I do not yet have :) I am bailing out of working on this issue, feel free to take a look. I have fixed a few things and attempted to implement the "assert_defs" predicate on my https://github.com/matt2xu/scryer-prolog/tree/support-thun branch.

triska commented 4 years ago

@matt2xu: Thank you a lot for working on this, it seems you are very close to being done!

Here is a version of lines//1:

lines([])     --> call(eos), !.
lines([L|Ls]) --> line(L), lines(Ls).

line([])     --> ( "\n" | call(eos) ), !.
line([C|Cs]) --> [C], line(Cs).

eos([], []).

Sample use:

?- phrase(lines(Ls), "abc\ndef\nghi").
   Ls = ["abc","def","ghi"].

The second argument of format/2 is a list, so for example:

?- format("hello = ~w~n", [hello]).
hello = hello

triska commented 4 years ago

@matt2xu : Please also check out the debugging constructs from #461 ! They make debugging very easy while staying very close to the program under consideration.

matt2xu commented 4 years ago

Progress! I've ended up printing println! statement in the source to see if my file was read or not :D It turned out that the file name needed to be an atom (in phrase_from_file), and I discovered that 'defs.txt' is an atom, but "defs.txt" is not (and the source was using the latter).

Now I need to implement blanks//0 and integer//1 as the module assumes these are available (because in SWI Prolog they are available).

ghost commented 4 years ago

You could try this:

blanks --> [C], { char_type(C, whitespace) }, ( blanks | call(eos)), !.
joy_term(int(I)) --> N, { number_chars(N, I) }, !.

matt2xu commented 4 years ago

@matt2xu: Thank you a lot for working on this, it seems you are very close to being done!

I don't think so :)

Thank you both for your encouragements and suggestions, but I simply don't have enough experience with DCG and Prolog for this. Between missing functions (code_type, atom_string), behavior I don't understand (char_type(Ch, whitespace) doesn't work when Ch is a space? Do I need to do some kind of conversion first? joy_term(list(J)) --> "[" is not called but if I replace with joy_term(X) --> "[" it looks ok), it will take forever for me to make this work. I think that I can be of better use on other issues!

triska commented 4 years ago

Yes, absolutely! I have filed a few issues that I think would all make excellent additions and improvements, and they would all be greatly appreciated contributions to Scryer. Please do not get stuck on any one of them. For example, maybe #469 or #472 would be more fitting for now!

ghost commented 4 years ago

For char_type/2 to work, you need :- use_module(library(charsio)). For testing with scryer-prolog:

?- use_module(library(charsio)).
?- char_type(' ', whitespace).
   true.
?-

A char is an atom with length 1, so ?- atom_length(a, N). atom_length('Hello', M). To do some conversions, you can use char_code/2. It seems that using char is better than code:

Scryer uses characters (one-char atoms) instead of codes: Characters are much easier to read in answers, and also easier to write in programs.

That code_type/2 is like char_type/2, atom_string/2 should be atom_chars/2 so all string should be converted as chars.

Yes, porting is hard.

triska commented 4 years ago

The great relief when porting a program to Scryer Prolog is the certainty that this is the last time that such deep modifications ever need to be made to it: Once this program is ported to Scryer, it can be ported to other conforming systems as soon as a few basic building blocks or libraries are in place, and porting will then consist of changing a few use_module/1 directives, if any.

I cannot overemphasize how valuable this property is: When writing programs for systems that do not conform to the ISO standard, one does it in the knowledge that every line one writes and which uses non-conforming features will need to be changed once one moves to a system that follows the standard, and this hinders development considerably.

Most of the porting work that is hard in this case is due to the usage of non-conforming features. Luckily, they can be readily replaced and at the same time improved considerably by using Scryer's compact representation of strings as lists of characters.

Here is code for parsing an integer with Scryer Prolog:

:- use_module(library(charsio)).
:- use_module(library(dcgs)).

joy_term(int(I)) --> integer(Ds), !, { number_chars(I, Ds) }.

integer([D|Ds]) --> digit(D), integer(Ds).
integer([D])    --> digit(D).

digit(D)   --> [D], { char_type(D, decimal_digit) }.

Sample query:

?- phrase(joy_term(int(I)), "1323").
   I = 1323.

librarianmage commented 2 years ago

May I ask what the status of this issue is?

mthom commented 2 years ago

May I ask what the status of this issue is?

I was confused by some version differences across Simon's SWI implementations of Thun and put the port aside to work on other stuff. Feel free to take it up if you want.

razetime commented 1 year ago

the code is now at https://osdn.net/projects/joypy/scm/git/Thun/blobs/master/implementations/Prolog/source/thun.pl, i think.

mthom / scryer-prolog

Port Simon Forman's Thun Interpreter #388