own-pt / cl-krr

Environment for knowledge representation, reasoning, and engineering.
Apache License 2.0
4 stars 3 forks source link

Adjust readtable to support proper casing #3

Closed fcbr closed 5 years ago

fcbr commented 8 years ago

SUO-KIF is a case sensitive language and thus we cannot have Lisp converting all symbols to upper case. Currently to avoid this problem we need to use mlisp, but a general solution is required.

arademaker commented 8 years ago

See

arademaker commented 5 years ago

@fcbr said:

From http://www-ksl.stanford.edu/knowledge-sharing/papers/kif.ps

"KIF originated in a Lisp application and inherits its syntax from Lisp. The relationship between linear KIF and structured KIF is most easily specfied by appeal to the Common Lisp reader. In particular a string of ascii characters forms a legal expression in linear KIF if and only if it is acceptable to the Common Lisp reader as defined in Steele's book and the structure produced by the Common Lisp reader is a legal expression of structured KIF as defined in the next section".

^ that to me indicates that KIF should be case-insensitive (as the default Lisp reader converts everything to upper case). If SUO-KIF is a variant of KIF, there's no reason why it should deviate from the same syntax convention...

But page 17 of https://github.com/ontologyportal/sigmakee/blob/master/suo-kif.pdf says:

SUO-KIF is intended as a language for knowledge authoring, unlike the original KIF, which was intended primarily as a language for knowledge interchange.

I believe our transformation must check for the crash of symbols when it runs in a case insensitive mode, in SBCL. If we are only losing legibility, it is acceptable for now and we can close #10. Later we can make the code more robust allowing the option to preserve the case of symbols.

hmuniz commented 5 years ago

The following list of axioms shows that we are not only losing legibility.

domainEnglishFormat.kif
1935 (termFormat EnglishLanguage Attorney "attorney")
Law.kif
480  (termFormat EnglishLanguage attorney "attorney")

----

domainEnglishFormat.kif
2436 (termFormat EnglishLanguage Broker "broker")
UXExperimentalTerms.kif
1887 (termFormat EnglishLanguage broker "broker")

---

domainEnglishFormat.kif
3123 (termFormat EnglishLanguage Composer "composer")
Music.kif
139  (termFormat EnglishLanguage composer "composer")

--- 

Music.kif
65   (termFormat EnglishLanguage Discography "discography")
232  (termFormat EnglishLanguage discography "discography")

---

domainEnglishFormat.kif
5723 (termFormat EnglishLanguage Judge "judge")
Law.kif
233  (termFormat EnglishLanguage judge "judge")

----

Music.kif
348  (termFormat EnglishLanguage musicGenre "music genre")
511  (termFormat EnglishLanguage MusicGenre "music genre")

---

domainEnglishFormat.kif
6988 (termFormat EnglishLanguage Musician "musician")
Music.kif
189  (termFormat EnglishLanguage musician "musician")

I generated this list modifying the read-kif function to does not remove duplicates and then I checked the duplicates using the get-duplicates. After that, I just compare the list of duplicates that mlisp and sbcl produce.

 (defun read-kif (files)
    (let ((res nil))
      (dolist (file files)
    (with-open-file (kb file)
                (do ((st (read kb nil nil)
                     (read kb nil nil)))
                    ((null st) res)
                  (push st res))))
      res))

  (defun get-duplicates (list &optional test)
    (let ((ht (make-hash-table :test (or test #'equal))) 
      ret)
      (dolist (x list)
    (incf (gethash x ht 0)))
      (maphash (lambda (key value)
         (when (> value 1)
           (push key ret)))
           ht)
      ret))

  (get-duplicates (read-kif *sumo*))
fcbr commented 5 years ago

Nicely done! Are (termFormat ...) the only cases where this happen? If so, it may not affect the TPTP output in practice because it is one of the "ignored predicates": https://github.com/own-pt/cl-krr/blob/master/suo-kif.lisp#L19-L23

arademaker commented 5 years ago

No @fcbr, @hmuniz used this list to further search for all occurrences of both versions of the symbols listed in the termFormat axioms above. For instance, we found occurrences of Attorney and attorney in other axioms. Same happens for the other symbols listed above.

fcbr commented 5 years ago

Ah, I spoke too soon. It looks like we do indeed might have problems:

(instance musicGenre BinaryPredicate)
(subclass MusicGenre RelationalAttribute)

So musicGenre is a predicate, where MusicGenre is a class.

fcbr commented 5 years ago

So there are a couple of options that I can think of (just thinking out loud):

  1. Use some readtable-style hack as described in #10
  2. Rename the clashing symbols; so for example musicGenre the predicate would be renamed to something like MUSICGENRE1 or MUSICGENREPREDICATE and MusicGenre would be renamed to MUSICGENRE2 or MUSICGENRECLASS.
  3. Future improvements to (2) would preserve the case, so we would generate musicGenrePredicate, and MusicGenreClass, for example.
hmuniz commented 5 years ago

To solve this problem I used the first option combined with piping the required symbols to make the code to work properly.