meedstrom / org-node

GNU General Public License v3.0
74 stars 2 forks source link

Org-node only seems to index/find a fraction of org-roam entities #8

Closed emacsomancer closed 2 months ago

emacsomancer commented 2 months ago

I started using quickroam the other day, and decided to try out org-node, but setting it up as:

(use-package org-node
  :vc (:fetcher github :repo "meedstrom/org-node")
  :hook (org-mode . org-node-cache-mode)
  :config
  (setq org-node-creation-fn #'org-node-new-by-roam-capture)
  (setq org-node-slug-fn #'org-node-slugify-like-roam)
  (setq org-node-creation-hook nil)
  (setq org-node-extra-id-dirs (list org-roam-directory)))

When I call org-node-find or org-node-insert, there are many org-roam entries that are missing (that org-roam-node-find and quickroam-find &c. find successfully). I've tried org-node-reset a few times, but it makes no difference.

(I think I have around 14,000 org-roam nodes.)

meedstrom commented 2 months ago

Thanks for reporting! What about if you do M-x org-roam-update-org-id-locations? Does the current value org-node-extra-id-dirs contain your actual org-roam-directory? (Thinking it's affected by which one gets loaded first...)

emacsomancer commented 2 months ago

Running org-roam-update-org-id-locations takes a bit, but seems to make no difference.

I actually set org-roam-directory before calling either the org-roam or org-node packages,so org-node should know the appropriate location,and indeed org-node-extra-id-dirs contains my actual org-roam directory.

My saying "fraction" was inaccurate. Org-node seems to know about 10,000 some of roughly 14,000 org-roam entities. I have some vague notion that the missing entries are ones that include :ROAM_ALIASES: headers, but I'm probably primed to think about that (see https://github.com/meedstrom/quickroam/issues/2 ). But many of my frequently used org-roam nodes are missing, and those are likely to be ones with defined aliases. (Of course, aliases may well be a red herring.)

meedstrom commented 2 months ago

The plot thickens. A puzzle for me! I'll probably get back to you tomorrow or so.

meedstrom commented 2 months ago

I've made some small bug fixes, I don't know if they'll have helped.

If it still doesn't work, can you tell me more about your setup? What's your OS? Do you use any right-to-left text? Access any files over TRAMP? Symlinks?

emacsomancer commented 2 months ago

Thanks! It seems to mainly work now.

For some reason there's still a slight discrepancy between org-node and org-roam (15963 vs 16042 nodes, respectively) on my laptop (but the other way on my phone, 16252 vs 16039, respectively), but org-node still at least seems to find most things.

I'm not sure what nodes are missing (or how to figure that out).

(I'm running on various Linuxen on laptop/desktop (Guix, Arch) and also on Android via Termux.

I surely do have a little bit of right-to-left text, but probably not in node names (but in body text).

Haven't tried over TRAMP.

I use a symlinked Org directory on my phone, but that seems to work fine with org-node as far as I can tell.)

meedstrom commented 2 months ago

Interesting numbers! Maybe you'll notice one day that you can't find a node, and then you can report it here :)

Symlinks were my guess because the org-node-extra-id-dirs are searched with the function directory-files-recursively without the FOLLOW-SYMLINKS argument. I guess it's fine when org-roam-directory itself is symlinked. It's just that links inside that point to directories outside the org-roam-directory won't be found.

I'm wondering how to write a safety wrapper to use FOLLOW-SYMLINKS... the docstring warns about infinite recursion.

Another thing that can affect discovery is a non-nil value of org-node-perf-assume-coding-system, if some files have a byte-order mark (BOM) and others don't.

Btw just checking, I assume you can find the nodes that have RTL text?

emacsomancer commented 2 months ago

I'm thinking that the discrepancy is perhaps a bit greater than it seems since org-node generally seems to find some things that regular org-roam doesn't index. But, again, I'm not sure what's missing. I checked a node that contains right-to-left text and org-node indexes that one without problem.

meedstrom commented 2 months ago

Thanks! It's also true that org-node looks up /all/ your org-id-locations, so it should exceed what org-roam indexes, at a minimum.

meedstrom commented 2 months ago

To count the files known to org-id:

(length (org-id-hash-to-alist org-id-locations))

To count the files known to org-node:

(cl-loop for node in (hash-table-values org-nodes)
 count (not (org-node-get-is-subtree node)))

To count the files known to org-roam:

(length (org-roam-list-files))
meedstrom commented 2 months ago

In fact, we can find out which files they're missing:

;; What org-roam knows that org-node doesn't
(seq-difference
 (mapcar #'file-truename (org-roam-list-files))
 (cl-loop for node in (hash-table-values org-nodes)
  unless (org-node-get-is-subtree node)
  collect (file-truename (org-node-get-file-path node))))

;; What org-node knows that org-roam doesn't
(seq-difference
 (cl-loop for node in (hash-table-values org-nodes)
  unless (org-node-get-is-subtree node)
  collect (file-truename (org-node-get-file-path node)))
 (mapcar #'file-truename (org-roam-list-files)))
emacsomancer commented 2 months ago

Thanks for the seq-difference functions!

It is curious, the results.

So, running the "(diff org-roam org-node)" one revealed about 30 files that org-roam "knew about" (in some sense) that org-node didn't. They all had one or more of the following characteristics:

Fixing these things, org-node was able to index them (though it seemed to require a restart of emacs to do so).

However, there still seem to be about 70 nodes that org-roam knows about that org-node doesn't. (report: org-node: 15985 / org-roam: 16055).

Further, the "(diff org-node org-roam)" function returned nil, even though I'm sure that there must be nodes the org-node knows about that org-roam doesn't. Maybe it's because they're nodes inside of files rather than files with top-level :PROPERTIES: ?

(Strangely, as far as I can tell, on my janky Termux emacs setup on Android, everything works perfectly, with org-node finding more nodes than org-roam, while on two different Linux machines, I have the above issue. I'm intrigued about why this would be the case still.)

meedstrom commented 2 months ago

Hm. Actually. I did a minor oversight. The cl-loop expression was only catching files with file-level noes -- as you said, with a top-level :PROPERTIES:..

Fixed. Maybe now org-node finds more files?

;; What org-roam knows that org-node doesn't
(seq-difference
 (mapcar #'file-truename (org-roam-list-files))
 (seq-uniq (cl-loop
            for node in (hash-table-values org-nodes)
            collect (file-truename (org-node-get-file-path node)))))

;; What org-node knows that org-roam doesn't
(seq-difference
 (seq-uniq (cl-loop
            for node in (hash-table-values org-nodes)
            collect (file-truename (org-node-get-file-path node))))
 (mapcar #'file-truename (org-roam-list-files)))
meedstrom commented 2 months ago

But yea, the stuff about malformed property drawers is a big problem. I've had a few of those too :) I've been thinking of doing some sort of autoformatter that could check all Org files. Recently found out about the builtin org-lint, which you can apparently run on save, haven't learned to use that yet.

emacsomancer commented 2 months ago

The new seq-difference function for "what org-roam knows about that org-node doesn't" produces the same output as the previous one, so it's not revealing any further nodes. The "what org-node knows about that org-mode doesn't" function now spits out links to various agenda/calendar org files, which is probably reasonable.

Some sort of linter/autoformatter could indeed be useful at some point, indeed. (I'm still curious where the discrepancy lies for my Linux boxes, since the same Org files are shared with the Android device, where org-node does seem to find everything.)

meedstrom commented 2 months ago

Hmm. Termux, with Emacs 28? 29? in console mode, I guess?

meedstrom commented 1 month ago

@emacsomancer I added a command you might enjoy, M-x org-node-lint-all-files, which runs the built-in org-lint on all files.