universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.49k stars 620 forks source link

Add parser classes, xml, sexp, ini, and toml #519

Open masatake opened 9 years ago

masatake commented 9 years ago

I will leave Tokyo tomorrow. So I will give you some attractions:)

Like regex and xcmd, having xml parser class will be useful. We can cover svg, html, xhtml, ant, docbook, ...xpath can be used to specify interesting elements.

I found following code in a public header file of libxml2.

/**
 * XML_GET_LINE:
 *
 * Macro to extract the line number of an element node.
 */
#define XML_GET_LINE(n)                     \
    (xmlGetLineNo(n))

For lisp family, S expression parser class will be uesful. I think current lisp related parsers are not useful. Generally lisp programmer introduce the application own define-something with using define-macro/defmacro. Definitions defined with define-something should be captured as tags.

Following are def s in emacs I'm using.

def-edebug-spec     defadvice
defalias    default-boundp
default-file-modes  default-font-height
default-indent-new-line     default-line-height
default-toplevel-value  default-value
defconst    defcustom
defcustom-c-stylevar    defface
defgroup    defimage
define-abbrev   define-abbrev-table
define-abbrevs  define-alternatives
define-auto-insert  define-button-type
define-category     define-ccl-program
define-char-code-property   define-charset
define-charset-alias    define-charset-internal
define-coding-system    define-coding-system-alias
define-coding-system-internal   define-compilation-mode
define-derived-mode     define-error
define-fringe-bitmap    define-generic-mode
define-global-abbrev    define-global-minor-mode
define-globalized-minor-mode    define-hash-table-test
define-ibuffer-column   define-ibuffer-filter
define-ibuffer-op   define-ibuffer-sorter
define-key  define-key-after
define-mail-abbrev  define-mail-alias
define-mail-user-agent  define-minor-mode
define-mode-abbrev  define-obsolete-face-alias
define-obsolete-function-alias  define-obsolete-variable-alias
define-prefix-command   define-skeleton
define-translation-hash-table   define-translation-table
define-widget   define-widget-keywords
defined-colors  defining-kbd-macro
defmacro    defmath
defsubst    deftheme
defun   defvar
defvar-local    defvaralias

Realizing the concept optlib is one of my primary motivation of working on ctags. However, now I recognize regex syntax I know is not so portable. It is just "syntax error" in MacOSX. regex on macosx is very limited. If we introduce a new parser class pcre, users can write a parser with more powerful syntax and portable way. I will never think making current regex parser obsolete but just introduce newer one.

Do you have more ideas about parser classes? Following code in parse.h is the start point.

typedef enum  {
  METHOD_NOT_CRAFTED    = 1 << 0,
  METHOD_REGEX          = 1 << 1,
  METHOD_XCMD           = 1 << 2,
  METHOD_XCMD_AVAILABLE = 1 << 3,
} parsingMethod;

Happy hacking.

masatake commented 8 years ago

https://github.com/arduino/ctags/blob/master/gir.c

This is very impressive parser. We should import this then generalize it.

masatake commented 8 years ago
    $ ./ctags -o - --langdef=maven \
    --xpath-maven="a,artifactId{}///*[local-name()='project' and namespace-uri()='http://maven.apache.org/POM/4.0.0']/*[local-name()='artifactId' and namespace-uri()='http://maven.apache.org/POM/4.0.0']/text()" pom.xml

    build-tools-root    pom.xml /^  <artifactId>build-tools-root</artifactId>$/;"   a
masatake commented 8 years ago

Hard-coded version now works!!!

% ./ctags -x  pom.xml 
build-tools-root artifactId    9 pom.xml <artifactId>build-tools-root</artifactId>
masatake commented 8 years ago

Hey, @p-montanus, I need libxml2. What we should do in gentle way? What I did is:

--- a/Makefile.in
+++ b/Makefile.in
@@ -68,14 +68,15 @@ COVERAGE_CFLAGS=--coverage
 COVERAGE_LDFLAGS=--coverage
 endif

-ALL_CFLAGS = $(CFLAGS) --std=gnu99 -Wall $(COVERAGE_CFLAGS)
+ALL_CFLAGS = $(CFLAGS) --std=gnu99 -Wall $(COVERAGE_CFLAGS) `pkg-config --cflags libxml-2.0`
+

 DEBUG_CPPFLAGS ?= -DDEBUG
 ALL_CPPFLAGS = $(CPPFLAGS)         \
    $(DEBUG_CPPFLAGS)           \
    -DDATADIR=\"$(pkgdatadir)\"     \
    -DPKGCONFDIR=\"$(pkgsysconfdir)\"   \
-   -DPKGLIBEXECDIR=\"$(pkglibexecdir)\"
+   -DPKGLIBEXECDIR=\"$(pkglibexecdir)\" 

 include $(srcdir)/source.mak

@@ -173,7 +174,7 @@ V_CC_1   =
 all: $(CTAGS_EXEC) $(READ_LIB) $(READ_CMD)

 $(CTAGS_EXEC): $(OBJECTS)
-   $(V_CC) $(CC) $(LDFLAGS) -o $@ $(OBJECTS) $(LIBS)
+   $(V_CC) $(CC) $(LDFLAGS) -o $@ $(OBJECTS) $(LIBS) `pkg-config --libs libxml-2.0`

 $(READ_CMD): readtags.c readtags.h
    $(V_CC) $(CC) -DREADTAGS_MAIN -I. -I$(srcdir) -I$(srcdir)/main $(DEFS) $(ALL_CPPFLAGS)  $(ALL_CFLAGS) $(LDFLAGS) -o $@ $(srcdir)/readtags.c
p-montanus commented 8 years ago

Hey, @p-montanus, I need libxml2. What we should do in gentle way?

Luke, use PKG_CONFIG_MODULES PKG_CHECK_MODULES in configure.ac, use @*_CFLAGS@ and @*_LIBS@ in Makefile.in.

PKG_CHECK_MODULES([LIBXML2], [libxml-2.0], [: if-found], [: if-not-found])
LIBXML2_CFLAGS = @LIBXML2_CFLAGS@
LIBXML2_LIBS = @LIBXML2_LIBS@
ALL_CFLAGS += $(LIBXML2_CFLAGS)
LIBS += $(LIBXML2_LIBS)
masatake commented 8 years ago

Great. After merging your #592 and #601, I will put make a PR. Instead of targeting maven, I will rewrite ant parser with this new technology.

@ffes, @k-takata, and @cweagans, is libxml2 available on your maintained platform? I found I can implement a XML based parser easily with libxml2. I would like to use it in ctags. I would like to hear your comment about using libxml2.

k-takata commented 8 years ago

(@masatake You misspelled my name. I have fixed it.)

I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2. But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)

masatake commented 8 years ago

(@masatake You misspelled my name. I have fixed it.)

I'm very sorry.

I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2. But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)I confirmed that MSYS2 has libxml2 packages (mingw-w64-i686-libxml2 and mingw-w64-x86_64-libxml2), so it would be easy to use libxml2 on MSYS2. But I'm not sure we can use it on MSVC. (Maybe we can, but not so easy I think.)

Thank you for the comment.

Instead of reworking on ant.c, it will be better to create main/lxpath.c. So I can put all libxml2 related ifdef/endif into the one file.

b4n commented 8 years ago

Luke, use PKG_CONFIG_MODULES in configure.ac, use @*_CFLAGS@ and @*_LIBS@ in Makefile.in.

Spelled PKG_CHECK_MODULES it is Obi-Wan ;)

p-montanus commented 8 years ago

Spelled PKG_CHECK_MODULES it is Obi-Wan ;)

Spelling is fixed, peacefully. May the Force be with you.

arichiardi commented 6 years ago

Hi folks what is the status of this one? Is some help needed? I came here while investigating how to generate good tags for Clojure.

masatake commented 6 years ago

Meta sexp parser has two aspects.

  1. it can be used as a kind of template for parsers like elips, cl, scheme, and, Clojure.
  2. it helps to capture user-defined defX in the parsers. In other words, the sexp meta parser helps a ctags user writing a subparser in the parsers like elips, cl, scheme, and, Clojure. About the concept, subparser, see http://docs.ctags.io/en/latest/running-multi-parsers.html?highlight=subparser .

I think, what I want is understandable to lisp hackers. The idea is very attractive to me. However, I don' have time to work on it. If you are interested in lisp family, you can try to implement it.

If you are just interested in Clojure, you can implement it with a crazy mtable meta parser. See http://docs.ctags.io/en/latest/optlib.html?highlight=mtable#byte-oriented-pattern-matching-with-multiple-regex-tables . It is not documented well. See also https://github.com/universal-ctags/ctags/issues/1620 .

arichiardi commented 6 years ago

Ok thanks! This information is very valuable, I will see what I can do!

masatake commented 3 years ago
class (meta parser) C level Optlib level note
regex yes YES
libxml(xpath) yes no See #3897
libyaml yes no Not so useful. libypath is needed.
S expression no no This should cover clojure, elisp, lisp, scheme.
json no no We have json parser.
iniconf no no We have iniconf parser.
toml no no
packci no no interpreter version of packcc