[Question] How to have separate tags files?

First my use case: I have ARM embedded projects that have

some code in the header files and stdlib that comes with the ARM compiler
some code in a library my company develops for our embedded use case
some code in another library my company develops for our embedded use case
some code in my project

I would like to use ctags for the tags files.

I usually am in the project's code, but I want code completion and looking up definitions and such with candidates, that are in the libraries. I figured when I am in a project and I do M-x citre-create-tags-file I can just add the paths to the libraries sources in that window that pops up and it creates a tags file for the project with all the libraries tags in it. This works, but I get a tags file that is >40MB. And I have many projects, so I would have to create tags files for each of them that are >40mb each. So what I would like to have instead is a separate tags file for each project with only the projects tags and one for the libraries. (or one for each library, I don't care about that) readtags would then look up a tag in the projects tags file and when it can't find it there it would look in the other files.

gtags has the environment variable GTAGSLIBPATH for that. I set it up to point into the libraries sources, create tags files with gtags for them and the rest is on gtags. But how could I set this up for ctags instead? All I could find was some commands to give vim (yes I know, the horror!) several tags files, but not how to do this in emacs. Also it would be best to have it in a set-up-once-and-forget-it style instead of doing it for every project again and again.

I also wanted the feature to support multiple tag files.

Supporting multiple tags files in readtags is not so hard (but not easy enough to implement in a week.).

Each tags file has pseudo tag entries at the beginning:

$ head tags
!_TAG_EXTRA_DESCRIPTION anonymous   /Include tags for non-named objects like lambda/
!_TAG_EXTRA_DESCRIPTION fileScope   /Include tags of file scope/
!_TAG_EXTRA_DESCRIPTION pseudo  /Include pseudo tags/
!_TAG_EXTRA_DESCRIPTION subparser   /Include tags generated by subparsers/
!_TAG_FIELD_DESCRIPTION epoch   /the last modified time of the input file (only for F\/file kind tag)/
!_TAG_FIELD_DESCRIPTION file    /File-restricted scoping/
!_TAG_FIELD_DESCRIPTION input   /input file/
!_TAG_FIELD_DESCRIPTION name    /tag name/
!_TAG_FIELD_DESCRIPTION pattern /pattern/
!_TAG_FIELD_DESCRIPTION typeref /Type and name of a variable or typedef/

They describe how the tags file is generated. readtags utilizes these pseudo-tag entries effectively.

For adding multiple tags files support to readtags, what we have to do:

write code verifying the consistency of pseudo tag entries in the files,
make libreadtags and the query engine used in the -Q option thread-safe,
make the readtags command run functions of libreadtags in multiple threads.

_ADDED after commenting 2. and 3. are for utilizing multiple CPUs. I have only to do 1. to support multiple tag files.

Extending the user interface of the readtags command is simple; if the user specifies -t options multiple times, the command should read all of them after verifying the consistency of pseudo-tag entries.

The alternative user interface introduces a pseudo tag entry for aggregation:

!_READTAGS_INCLUDE_TAGS /somewhere/tags-for-lib1.tags   /external tags file/
!_READTAGS_INCLUDE_TAGS /somewhere/tags-for-lib2.tags   /external tags file/
!_READTAGS_INCLUDE_TAGS /somewhere/tags-for-lib3.tags   /external tags file/
...

I only know the partial aspects of the implementation of citre, but I expect the second approach (supporting _READTAGS_INCLUDE_TAGS) doesn't require citre much change to support multiple tags files.

All I could find was some commands to give vim (yes I know, the horror!) several tags files

AFAIK, ctags plugins for vim seem to read tags files directly, so there's no difficulty for them to read multiple tags files. Citre uses readtags to read tags files, and it can only read one tags file.

So why not running readtags multiple times to read multiple tags files and gather all the results? The main reason is that sorting is done by readtags, which knows the tags file better, and is faster than sorting by elisp. Read multiple tags file requires us to sort by elisp, that means giving up these benefits and reinventing the wheel.

I expect the second approach (supporting _READTAGS_INCLUDE_TAGS) doesn't require citre much change to support multiple tags files.

Yes, as Citre uses only one tags file for one directory. Using multiple tags files is doable but requires to put one more file in user's project directory to keep their locations.

verifying the consistency of pseudo tag entries in the files

I'm not sure why this is necessary, and to me it seems like a too strict restriction. Could you elaborate on this?

I expect the second approach (supporting _READTAGS_INCLUDE_TAGS) doesn't require citre much change to support multiple tags files.

Yes, as Citre uses only one tags file for one directory. Using multiple tags files is doable but requires to put one more file in user's project directory to keep their locations.

What do you think about putting such aggregation tags files to ~/.citre.d?

verifying the consistency of pseudo tag entries in the files

I'm not sure why this is necessary, and to me it seems like a too strict restriction. Could you elaborate on this?

If a pseudo tag having different values in tags files can be unified without breaking semantics, ideally, readtags should unify the values.

e.g., about !_TAG_PROC_CWD, readtags must unify $input and adjust the $input field of all tags in output. If --list-pseudo-tags is given, what kind of line for !_TAG_FILE_SORTED should readtags print? I have not understood all the hidden issues about unification clearly. However, there are many minor issues in unification. They may require me to make "decisions." Instead of seeking ideal semantics for tag unification, I think it is better to define temporary restrictions on tags files that can be specified to -t .. -t .. -t ..... We can remove the restrictions incrementally after a minimum valuable product.

A version without ptag unification works on my local PC:

$ ~/bin/ctags -o podman.tags -R ~/var/podman
$ ~/bin/ctags -o glibc.tags -R ~/var/glibc 
$ ~/bin/ctags -o coreutils832.tags -R /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32
$ ./readtags -A -Q '(and (eq? $name "main") (#/.*user.*/ $input))' -t podman.tags -t glibc.tags -t coreutils832.tags -l
main    /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/gnulib-tests/test-userspec.c  /^main (void)$/
main    /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/lib/getusershell.c    /^main (void)$/
main    /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/lib/userspec.c    /^main (int argc, char **argv)$/
main    /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/src/users.c   /^main (int argc, char **argv)$/
$ ./readtags -A -Q '(and (eq? $name "main") (#/.*container.*/ $input))' -t podman.tags -t glibc.tags -t coreutils832.tags -l
main    /home/yamato/var/glibc/support/echo-container.c /^main (int argc, const char **argv)$/
main    /home/yamato/var/glibc/support/shell-container.c    /^main (int argc, const char **argv)$/
main    /home/yamato/var/glibc/support/test-container.c /^main (int argc, char **argv)$/
main    /home/yamato/var/glibc/support/true-container.c /^main (void)$/
$ ./readtags -A -Q '(not (#/.*_test.*/ $input))' -t podman.tags -t glibc.tags -t coreutils832.tags rootless
rootless    /home/yamato/var/podman/pkg/rootless/rootless.go    /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_freebsd.go    /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_linux.go  /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_unsupported.go    /^package rootless$/
rootless    /home/yamato/var/podman/vendor/github.com/containers/image/v5/internal/rootless/rootless.go /^package rootless$/

The next exciting step is making the readtags process an aggregate tags file like:

!_READTAGS_INCLUDE  podman.tags //
!_READTAGS_INCLUDE  glibc.tags  //
!_READTAGS_INCLUDE  coreutils832.tags   //
!_TAG_FILE_FORMAT   2   /extended format; --format=1 will not append ;" to lines/
!_TAG_FILE_SORTED   1   /0=unsorted, 1=sorted, 2=foldcase/
!_TAG_OUTPUT_EXCMD  mixed   /number, pattern, mixed, or combineV2/
!_TAG_OUTPUT_FILESEP    slash   /slash or backslash/
!_TAG_OUTPUT_MODE   u-ctags /u-ctags or e-ctags/
!_TAG_OUTPUT_VERSION    0.0 /current.age/
!_TAG_PATTERN_LENGTH_LIMIT  96  /0 for no limit/
!_TAG_PROC_CWD  /home/yamato/var/ctags-github/  //
!_TAG_PROGRAM_AUTHOR    Universal Ctags Team    //
!_TAG_PROGRAM_NAME  Cons Tags   /A tool for generating aggregate tags/
!_TAG_PROGRAM_URL   https://ctags.io/   /official site/
!_TAG_PROGRAM_VERSION   0.0.0   //

#!_READTAGS_INCLUDE also works.

$ ./readtags -t podman.tags -t glibc.tags -t coreutils832.tags --generate-aggregate-tag-file | tee tags
!_READTAGS_INCLUDE  podman.tags //
!_READTAGS_INCLUDE  glibc.tags  //
!_READTAGS_INCLUDE  coreutils832.tags   //
!_TAG_FILE_FORMAT   2   //
!_TAG_FILE_SORTED   1   //
!_TAG_OUTPUT_EXCMD  pattern //
!_TAG_OUTPUT_FILESEP    slash   //
!_TAG_OUTPUT_MODE   u-ctags //
!_TAG_OUTPUT_VERSION    0.0 /current.age/
!_TAG_PATTERN_LENGTH_LIMIT  0   //
!_TAG_PROC_CWD  /home/yamato/var/ctags-github   //
!_TAG_PROGRAM_AUTHOR    Universal Ctags Team    //
!_TAG_PROGRAM_NAME  readtags    /with -X option/
!_TAG_PROGRAM_URL   https://ctags.io/   /official site/
!_TAG_PROGRAM_VERSION   0.0.0   /TODO/
$ ./readtags -A rootless                                                                               
rootless    /home/yamato/var/podman/pkg/rootless/rootless.go    /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_freebsd.go    /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_linux.go  /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_test.go   /^package rootless$/
rootless    /home/yamato/var/podman/pkg/rootless/rootless_unsupported.go    /^package rootless$/
rootless    /home/yamato/var/podman/vendor/github.com/containers/image/v5/internal/rootless/rootless.go /^package rootless$/

You can try the experimental code: https://github.com/universal-ctags/ctags/pull/4068

The experimental code doesn't consider the consistency of pseudo tags between tag files at all. So -D and -P options are unreliable. However, for a person using readtags directly from command line, this experimental code may be useful enough. Yes, I am a such person.

@masatake This is super cool. Thanks a lot.

I do find a problem:

$ ./readtags -D -t .tags
!_READTAGS_INCLUDE      /home/kino/.emacs.d/straight/repos/citre/.tags  /included tags file/
...
$ ./readtags -t .tags -np -S '(<or> (<> $name &name) 0)' - c
...
cxx_token_chain.h       parsers/cxx/cxx_token_chain.c   /^#include "cxx_token_chain.h"/
cxx_token_chain.h       parsers/cxx/cxx_token_chain.h   1
cython.c        parsers/cython.c        1
citre--ace-key-seqs     citre-ui-peek.el        /^(defun citre--ace-key-seqs (n)$/
citre--ace-ov   citre-ui-peek.el        /^(defvar citre--ace-ov nil$/
citre--add-face citre-ui-peek.el        /^(defun citre--add-face (str face)$/
...

Seems the sorting is done separately on each tags file. Is that right?

Also, it would be great if we could have a ctags option to include other tags files in the generated tags file, which suits the usage pattern of Citre (and I believe other client tools). Usually people just tag the "project directory" and start using Citre, it would be great to specify included tags files when tagging.

Seems the sorting is done separately on each tags file. Is that right?

You are correct. I updated the pull request for supporting "-S ... -t ... -t ...".

% ./readtags -S '(<> $name &name)' -Q '(and (#/^z.*/ $name) (eq? (length $name) 10))' -l
z85_encode  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/src/basenc.c  /^z85_encode (const char *restrict in, size_t inlen,$/
z85_length  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/src/basenc.c  /^z85_length (int len)$/
zIndexedBy  /home/yamato/var/podman/vendor/github.com/mattn/go-sqlite3/sqlite3-binding.c    /^    char *zIndexedBy;    \/* Identifier from "INDEXED BY <zIndex>" clause *\/$/
z_filename  /home/yamato/var/glibc/timezone/zic.c   /^  const char *    z_filename;$/
zcat_setup  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/tests/misc/help-version.sh    /^zcat_setup () { args=$zin; }$/
zcmp_setup  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/tests/misc/help-version.sh    /^zcmp_setup () { args="$zin $zin2"; }$/
zeroOffset  /home/yamato/var/podman/vendor/google.golang.org/protobuf/internal/impl/pointer_reflect.go  /^var zeroOffset = offset{index: 0}$/
zeroOffset  /home/yamato/var/podman/vendor/google.golang.org/protobuf/internal/impl/pointer_unsafe.go   /^var zeroOffset = offset(0)$/
zeroReader  /home/yamato/var/podman/vendor/github.com/Microsoft/go-winio/backuptar/tar.go   /^type zeroReader struct{}$/
zeroReader  /home/yamato/var/podman/vendor/github.com/sylabs/sif/v2/pkg/sif/create.go   /^type zeroReader struct{}$/
zeroReader  /home/yamato/var/podman/vendor/github.com/vbatts/tar-split/archive/tar/reader.go    /^type zeroReader struct{}$/
zeroString  /home/yamato/var/podman/vendor/github.com/segmentio/ksuid/base62.go /^  zeroString       = "000000000000000000000000000"$/
zeroWriter  /home/yamato/var/podman/vendor/github.com/vbatts/tar-split/archive/tar/writer.go    /^type zeroWriter struct{}$/
zeroinfnan  /home/yamato/var/glibc/sysdeps/ieee754/dbl-64/e_pow.c   /^zeroinfnan (uint64_t i)$/
zeroinfnan  /home/yamato/var/glibc/sysdeps/ieee754/flt-32/e_powf.c  /^zeroinfnan (uint32_t ix)$/
zfsOptions  /home/yamato/var/podman/vendor/github.com/containers/storage/drivers/zfs/zfs.go /^type zfsOptions struct {$/
zghMAIndex  /home/yamato/var/podman/vendor/golang.org/x/text/internal/language/compact/tables.go    /^  zghMAIndex        ID = 760$/
znew_setup  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/tests/misc/help-version.sh    /^znew_setup () { args=$bigZ_in; }$/
zone_names  /home/yamato/var/glibc/time/tzfile.c    /^static char *zone_names;$/
zones_seen  /srv/sources9c/sources/c/coreutils/8.32-31.el9--srpm/pre-build/coreutils-8.32/lib/parse-datetime.c  /^  ptrdiff_t zones_seen;$/
zopstore12  /home/yamato/var/podman/vendor/github.com/twitchyliquid64/golang-asm/obj/s390x/asmz.go  /^func (c *ctxtz) zopstore12(a obj.As) (uint32, bool) {$/
zstdReader  /home/yamato/var/podman/vendor/github.com/containers/image/v5/pkg/compression/zstd.go   /^func zstdReader(buf io.Reader) (io.ReadCloser, error) {$/
zstdReader  /home/yamato/var/podman/vendor/github.com/containers/storage/pkg/archive/archive_zstd.go    /^func zstdReader(buf io.Reader) (io.ReadCloser, error) {$/
zstdReader  /home/yamato/var/podman/vendor/github.com/containers/storage/pkg/chunked/storage_linux.go   /^  zstdReader *zstd.Decoder$/
zstdWriter  /home/yamato/var/podman/vendor/github.com/containers/image/v5/pkg/compression/zstd.go   /^func zstdWriter(dest io.Writer) (io.WriteCloser, error) {$/
zstdWriter  /home/yamato/var/podman/vendor/github.com/containers/storage/pkg/archive/archive_zstd.go    /^func zstdWriter(dest io.Writer) (io.WriteCloser, error) {$/

Also, it would be great if we could have a ctags option to include other tags files in the generated tags file, which suits the usage pattern of Citre (and I believe other client tools). Usually people just tag the "project directory" and start using Citre, it would be great to specify included tags files when tagging.

I focused on my use case:

Make tag files (pre-generated tag files) for various source trees (e.g., glibc.tags, linux.tags, and podman.tags), and then
Make a tag file (an aggregate tag file) that includes a subset of the pre-generated tag files chosen by the user's task.

The -X option of readtags is for this use case.

In your idea, there is no conceptual separation between pre-generated tag files (per source tree) and aggregate tag files.

In my current idea, ctags should not know about !_READTAGS_INCLUDE. What about adding --inject-user-ptag= option to ctags?

--init-user-ptag=READTAGS_INCLUDE
--set-user-ptag.input=podman.tags
--set-user-ptag.pattern=
--make-user-ptag
--init-user-ptag=READTAGS_INCLUDE
--set-user-ptag.input=linux.tags
--set-user-ptag.pattern=
--make-user-ptag
--init-user-ptag=READTAGS_INCLUDE
--set-user-ptag.input=qemu.tags
--set-user-ptag.pattern=
--make-user-ptag

I want to avoid designing large specifications at once.

In your idea, there is no conceptual separation between pre-generated tag files (per source tree) and aggregate tag files.

Forgive me if I don't understand your use case well, but I don't see why such conceptual distinction is necessary. After all, !_READTAGS_INCLUDE is just a ptag, and ctags already emits ptags.

readtags could now work on 3 kinds of tags files:

Kind	Contains regular tags	Contains `!_READTAGS_INCLUDE` ptag
Pre-generated tags files	Yes	No
Aggregated tags files	No	Yes
The 3rd kind	Yes	Yes

In your use case, the 1st and 2nd kind are generated by ctags and readtags -X. But having the ability to direct ctags to generate the 3rd kind seems to not interfering with the first two.

What about adding --inject-user-ptag= option to ctags?

If we may have edittags program in the future, then maybe we shouldn't add these to ctags but edittags.

But feel free to design it in your own way, as I think the current status is enough for Citre to work on multiple tags files: We could add another file in /path/to/project/.ctags.d/ that lists the extra tags files, and use -t option of readtags to read them. Though, being able to generate "the 3rd kind" of tags file is preferable for Citre.

universal-ctags / citre

[Question] How to have separate tags files? #178