Open mthom opened 4 years ago
@triska , this could be the next challenge but I started to learn Prolog this month. And I'm also trying to understand the engine (reading Warren's Abstract Machine: A Tutorial Reconstruction).
@notoria: Thank you for looking into it, I would greatly enjoy working on this with you.
I have filed #394 as one of the prerequisites for this project.
I started looking into it, per https://github.com/mthom/scryer-prolog/issues/401#issuecomment-619933630
I downloaded the thun.pl
from the description and indeed it does not work :smile:
First problem: the file has a Byte Order Mark (FE FF), so it is presumably UTF-16 (Big Endian) :thinking: :confused:
Removing the BOM allowed me to go a bit further, hitting a "syntax_error(non_prolog_character)" on line 2, which I suspect is caused by the nice/weird "ANSI Shadow" characters.
Both issues are in the prolog_parser. Questions:
I took a look at this Prolog codebase a couple of days ago. My usual approach is to use the Logtalk linter to check for potential porting issues. In this case, used SWI-Prolog as the Logtalk backend compiler (which was the compiler used for the original code) and the following steps:
?- {hook_objects(loader)}.
...
true.
?- logtalk_compile('thun.pl', [hook(object_wrapper_hook)]).
...
This resulted in three compilation errors. The fixes were:
@@ -44,7 +44,7 @@
*/
:- use_module(library(clpfd)).
-:- use_module(library(dcg/basics)).
+:- use_module(library(dcg/basics), except([eos//0])).
:- dynamic func/3.
:- dynamic def/2.
@@ -439,7 +439,7 @@
lines(Codes, [Line|Lines]) :- append(Line, [0'\n|Rest], Codes), !, lines(Rest, Lines).
lines(Codes, [Codes]).
-:- assert_defs("defs.txt").
+:- initialization(assert_defs("defs.txt")).
% A meta function that finds the names of all available functions.
@@ -1024,7 +1024,7 @@
% Simple DCGs to expand/contract definitions.
-expando, Body --> [Def], {def(Def, Body)}.
+expando, [Head|Body] --> [Def], {def(Def, [Head|Body])}.
The first fix was required due to eos//0
being a built-in non-terminal in Logtalk. Should not be required for Scryer Prolog. The second fix is standards compliance fix, solved using an initialization/1
directive instead of an arbitrary goal as a directive. This one is likely required for Scryer Prolog. The third is the use of a variable in the place of a push-back list. SWI-Prolog accepts it. Logtalk and all other Prolog systems that I tried throw an instantiation error. Possibly the same issue with Scryer Prolog. Note that my quick fix may not be correct/complete. After these fixes, the thun.pl
file compiles with only warnings. The relevant ones are:
:- use_module(library(lists), [append/3, list_to_set /2, member/2, select/3]).
:- use_module(library(apply), [maplist/2]).
:- use_module(library(assoc), [assoc_to_list/2, del_assoc/4, empty_assoc/1, put_assoc/4]).
:- use_module(library(readutils), [read_file_to_codes/3]).
:- use_module(library(listing), [portray_clause/1]).
You may prefer to use instead use_module/1
directives. I used use_module/2
directives above to list the actual library predicates being called. For each missing predicate, the Logtalk linter prints a warning like:
* Unknown predicate called but not defined: maplist/2
* while compiling object thun
* in file /Users/pmoura/Downloads/thun.pl between lines 885-890
* Singleton variables: [List,Item]
* while compiling object thun
* in file /Users/pmoura/Downloads/thun.pl between lines 859-862
*
* Suspicious call: A=..[func,B,C,D] instead of A=func(B,C,D)
* while compiling object thun
* in file /Users/pmoura/Downloads/thun.pl between lines 553-557
*
* Missing meta_non_terminal directive for non-terminal: rebo//2
* while compiling object thun
* in file /Users/pmoura/Downloads/thun.pl at or above line 1037
*
* Missing predicate directive: :-dynamic combo/5.
* while compiling object thun
* in file /Users/pmoura/Downloads/thun.pl between lines 447-452
Hope this helps.
@matt2xu:
I started looking into it, per #401 (comment)
I downloaded the
thun.pl
from the description and indeed it does not work smile First problem: the file has a Byte Order Mark (FE FF), so it is presumably UTF-16 (Big Endian) thinking confused Removing the BOM allowed me to go a bit further, hitting a "syntax_error(non_prolog_character)" on line 2, which I suspect is caused by the nice/weird "ANSI Shadow" characters.Both issues are in the prolog_parser. Questions:
1. what encoding is/are/should be supported? Only UTF-8? Should we throw a perhaps clearer error when encountering a BOM?
Yes, only UTF-8 is supported. An error for BOM would be appropriate.
2. in a block comment ("bracketed comment") why not ignore everything until chars "*" and "/" are encountered? Does the standard require only "Prolog characters" be present?
Sort of, yes. According to the ISO standard, each char must be a "prolog_char", which is the union of the graphic, alphanumeric, solo, layout, and meta char classes. Therefore, to include further characters in the "prolog_char" class, we must categorize each of the "shadow chars" as belonging to at least one of the five.
Are you a fan of the British band Wire?
@pmoura: Thanks, that's helpful!
Sort of, yes. According to the ISO standard, each char must be a "prolog_char", which is the union of the graphic, alphanumeric, solo, layout, and meta char classes. Therefore, to include further characters in the "prolog_char" class, we must categorize each of the "shadow chars" as belonging to at least one of the five.
@mthom Given that the rule for single line comment already skips any character, I've done the same thing for multiline comments :)
I've added some integration tests, making the lexer
module public so I can call it without having to create a Parser. I'll solve the BOM thing next!
Are you a fan of the British band Wire?
I don't know them, I'll listen :)
@pmoura: thanks for the logs, this will help!
Does this mean a Prolog code can contain binary data in comment?
Given that the rule for single line comment already skips any character, I've done the same thing for multiline comments :)
OK, that's fine. :+1:
I don't know them, I'll listen :)
Your github handle is very similar to one of their early songs, I was just curious. :)
Does this mean a Prolog code can contain binary data in comment?
The lexer deals in Rust's char
's. char
's are 4 bytes in size and map onto Unicode scalar values. I'm not sure if they somehow include binary data.
Does this mean a Prolog code can contain binary data in comment?
What do you mean by "binary data"? UTF-8 specifies an encoding of characters where a character can be encoded as 1 byte if its value is less than 128 (so ASCII 7 bits), or up to 4 bits otherwise. In this encoding some values are illegal/invalid, for instance a 2-byte sequence 110xxxx 0xxxxxxx, or FF or FE, etc. In other words, not all binary data is legal UTF-8. That said, you could very well encode binary data, let's say a JPEG image for instance, by using some form of conversion to make it legal UTF-8. The same principle is applied on the Web when you embed images in CSS using base64 encoding, but instead here you would use (checks Wikipedia) base 1,112,064 encoding.
I meant arbitrary data like a raw JPEG image in the comment.
PR on prolog_parser to handle any char in multiline comment, check invalid BOM, and skip valid UTF-8 BOM: https://github.com/mthom/prolog_parser/pull/5
After fixing the imports I had an issue
caught: error(syntax_error(incomplete_reduction),use_module/1:53)
which I tracked down, by commenting % :- dynamic func/3.
and % :- dynamic def/2.
, to come from line 564:
rule(Head, [A|B], Head :- maplist(call, [A|B])).
So:
This is only the beginning I think, then the initialization uses read_file_to_codes
to apparently read a file into "a list of character codes"? :thinking: What is or should be the equivalent in Scryer Prolog? I saw that phrase_from_file
was added recently, and also the README says that streams are a work in progress.
dynamic
is not a (standard) operator, please write the directive in functional notation, i.e.:
:- dynamic(def/2).
phrase_from_file/2
is the best way to read from a file, and would be the preferred method even if streams were already available.
Using phrase_from_file/2
, a good way to rewrite assert_defs/1
is:
assert_defs(DefsFile) :- phrase_from_file(lines(Lines), DefsFile), maplist(phrase(joy_def), Lines).
where lines//1
should be a DCG that can be applied to the file.
The key advantages of this is that:
phrase_from_file/2
can be implemented very efficiently, using lazy loading of only the required parts of the file.Scryer uses characters (one-char atoms) instead of codes: Characters are much easier to read in answers, and also easier to write in programs.
Regarding the "incomplete reduction" error:
6.3.3.1 Arguments An argument (represented by arg in the syntax rules) occurs as the argument of a compound term or element of a list. It can be an atom which is an operator, or a term with priority not greater than 999. When an argument is an arbitrary term, its priority shall be less than the priority
So, for example, instead of:
rule(., ., a :- b).
write:
rule(., ., (a :- b)).
This is a good example where lax enforcement of ISO syntax leads to non-portable programs. One of the major attractions of Scryer Prolog is its strong adherence to the standard, allowing us to write more portable applications.
I have started to port thun.pl
, but it seems to me to be a bigger issue than initially expected. This requires good knowledge of Prolog, which I do not yet have :) I am bailing out of working on this issue, feel free to take a look. I have fixed a few things and attempted to implement the "assert_defs" predicate on my https://github.com/matt2xu/scryer-prolog/tree/support-thun branch.
@matt2xu: Thank you a lot for working on this, it seems you are very close to being done!
Here is a version of lines//1
:
lines([]) --> call(eos), !. lines([L|Ls]) --> line(L), lines(Ls). line([]) --> ( "\n" | call(eos) ), !. line([C|Cs]) --> [C], line(Cs). eos([], []).
Sample use:
?- phrase(lines(Ls), "abc\ndef\nghi"). Ls = ["abc","def","ghi"].
The second argument of format/2
is a list, so for example:
?- format("hello = ~w~n", [hello]). hello = hello
@matt2xu : Please also check out the debugging constructs from #461 ! They make debugging very easy while staying very close to the program under consideration.
Progress! I've ended up printing println!
statement in the source to see if my file was read or not :D It turned out that the file name needed to be an atom (in phrase_from_file), and I discovered that 'defs.txt'
is an atom, but "defs.txt"
is not (and the source was using the latter).
Now I need to implement blanks//0
and integer//1
as the module assumes these are available (because in SWI Prolog they are available).
You could try this:
blanks --> [C], { char_type(C, whitespace) }, ( blanks | call(eos)), !.
joy_term(int(I)) --> N, { number_chars(N, I) }, !.
@matt2xu: Thank you a lot for working on this, it seems you are very close to being done!
I don't think so :)
Thank you both for your encouragements and suggestions, but I simply don't have enough experience with DCG and Prolog for this. Between missing functions (code_type
, atom_string
), behavior I don't understand (char_type(Ch, whitespace)
doesn't work when Ch is a space? Do I need to do some kind of conversion first? joy_term(list(J)) --> "["
is not called but if I replace with joy_term(X) --> "["
it looks ok), it will take forever for me to make this work. I think that I can be of better use on other issues!
Yes, absolutely! I have filed a few issues that I think would all make excellent additions and improvements, and they would all be greatly appreciated contributions to Scryer. Please do not get stuck on any one of them. For example, maybe #469 or #472 would be more fitting for now!
For char_type/2
to work, you need :- use_module(library(charsio)).
For testing with scryer-prolog
:
?- use_module(library(charsio)).
?- char_type(' ', whitespace).
true.
?-
A char
is an atom with length 1, so ?- atom_length(a, N). atom_length('Hello', M).
To do some conversions, you can use char_code/2
. It seems that using char
is better than code
:
Scryer uses characters (one-char atoms) instead of codes: Characters are much easier to read in answers, and also easier to write in programs.
That code_type/2
is like char_type/2
, atom_string/2
should be atom_chars/2
so all string
should be converted as chars
.
Yes, porting is hard.
The great relief when porting a program to Scryer Prolog is the certainty that this is the last time that such deep modifications ever need to be made to it: Once this program is ported to Scryer, it can be ported to other conforming systems as soon as a few basic building blocks or libraries are in place, and porting will then consist of changing a few use_module/1
directives, if any.
I cannot overemphasize how valuable this property is: When writing programs for systems that do not conform to the ISO standard, one does it in the knowledge that every line one writes and which uses non-conforming features will need to be changed once one moves to a system that follows the standard, and this hinders development considerably.
Most of the porting work that is hard in this case is due to the usage of non-conforming features. Luckily, they can be readily replaced and at the same time improved considerably by using Scryer's compact representation of strings as lists of characters.
Here is code for parsing an integer with Scryer Prolog:
:- use_module(library(charsio)). :- use_module(library(dcgs)). joy_term(int(I)) --> integer(Ds), !, { number_chars(I, Ds) }. integer([D|Ds]) --> digit(D), integer(Ds). integer([D]) --> digit(D). digit(D) --> [D], { char_type(D, decimal_digit) }.
Sample query:
?- phrase(joy_term(int(I)), "1323"). I = 1323.
May I ask what the status of this issue is?
May I ask what the status of this issue is?
I was confused by some version differences across Simon's SWI implementations of Thun and put the port aside to work on other stuff. Feel free to take it up if you want.
the code is now at https://osdn.net/projects/joypy/scm/git/Thun/blobs/master/implementations/Prolog/source/thun.pl, i think.
Now that
char_type
is implemented, I'd like to see Simon Forman's Thun interpreter ported to Scryer Prolog:https://osdn.net/projects/joypy/scm/hg/Joypy/blobs/tip/thun/thun.pl
The interpreter is wonderfully demonstrative of Prolog's strengths. But, it needs a few small changes in order to conform to the ISO standard. Also, you'll notice from the source that it uses a few less-than-common UTF-8 characters that Scryer's parser does not yet recognize. Ideally,
char_type
should be extended to recognize them as well.