sharkdp / bat

A cat(1) clone with wings.
Apache License 2.0
49.74k stars 1.25k forks source link

no syntax highlighting for symlinked file #1001

Open sumanthratna opened 4 years ago

sumanthratna commented 4 years ago

See https://github.com/sharkdp/bat/issues/984#issuecomment-627998442.

What version of bat are you using? bat 0.15.1

Describe the bug you encountered: bat ~/.ssh/config is showing the contents of my ssh_config in all white. This might be because ~/.ssh/config is symlinked (ln -s) to a file in my dotfiles repo.

Describe what you expected to happen? bat ~/.ssh/config should highlight the contents of the file according to https://github.com/robballou/sublimetext-sshconfig.

How did you install bat? Homebrew


system

$ uname -srm Darwin 19.5.0 x86_64

$ sw_vers ProductName: Mac OS X
ProductVersion: 10.15.5
BuildVersion: 19F83c

bat

$ bat --version bat 0.15.1

$ env

bat_config

bat_wrapper

No wrapper script for 'bat'.

bat_wrapper_function

No wrapper function for 'bat'.

No wrapper function for 'cat'.

tool

$ less --version less 487 (POSIX regular expressions)

sharkdp commented 4 years ago

Thank you for reporting this!

bat ~/.ssh/config is showing the contents of my ssh_config in all white. This might be because ~/.ssh/config is symlinked (ln -s) to a file in my dotfiles repo.

That is indeed a bug. I can confirm this.

The culprit is this line where we canonicalize the path, which resolves symlinks:

https://github.com/sharkdp/bat/blob/c2c70707125720619f7dd7d2ad6001a07cf97df6/src/assets.rs#L207-L208

The reason I used .canonicalize() was to solve situations like cd /etc; bat fstab, where we would have to resolve the relative fstab path to an absolute /etc/fstab path. Similarly, if someone is inside $USER/.ssh, calling bat config should also map to the SSH config syntax.

In order to solve this, I guess we should first look up the non-canonicalized path in the syntax mapping. If this does not lead to any results, we look up the canonicalized path as well. But we should think about this more carefully. And probably merge #985 first, before we attempt to solve this.

eth-p commented 4 years ago

In order to solve this, I guess we should first look up the non-canonicalized path in the syntax mapping. If this does not lead to any results, we look up the canonicalized path as well. But we should think about this more carefully.

This is a tough one, because we have three possible path names:

The correct highlighting really depends on the context. If it's a symlinked config file like this issue, it should be the absolute path. If it's a symlinked file that doesn't have an absolute path syntax mapping, it should probably be the canonical path because that's what the file actually is. And if that doesn't have a mapping, should it fall back to the relative path/filename?

Unless we have a way to determine if a syntax provides an absolute-path mapping for a file, I think the safest solution would probably be to look it up in order of absolute, canonical, and then relative/filename.

And probably merge #985 first, before we attempt to solve this.

That would be a good idea, but I could also just add a fix for this as part of #985 if that's easier.

sharkdp commented 4 years ago

Just to make sure that we have the same understanding of canonical/absolute/as-is path:

cd /home
bat shark/.ssh/config

The correct highlighting really depends on the context. If it's a symlinked config file like this issue, it should be the absolute path.

:+1:

If it's a symlinked file that doesn't have an absolute path syntax mapping, it should probably be the canonical path because that's what the file actually is.

You are thinking about something like a symlink from my_script -> original_file.ext where you would like to use the extension of the original file to get some information about my_script? But what if we have a scenario script.ext1 -> script.ext2? Should we really look up ext2 or accept the fact that script.ext1 doesn't lead to any matches in the syntax mapping and then get the actual syntax for ext1?

I'd rather go with the latter and not consider the canonical path at all.

Also, do we really have to look up the (potentially relative) as-is path? What would be a case where this would be useful? Maybe it's enough to simply get the absolute path and use it as a basis for the syntax mapping.

Note: getting the "absolute path" is actually non-trivial in Rust: https://stackoverflow.com/a/54817755/704831

eth-p commented 4 years ago

Just to make sure that we have the same understanding of canonical/absolute/as-is path

Yep, it looks like we do.

You are thinking about something like a symlink from my_script -> original_file.ext where you would like to use the extension of the original file to get some information about my_script?

Something like that, yes.

But what if we have a scenario script.ext1 -> script.ext2? Should we really look up ext2 or accept the fact that script.ext1 doesn't lead to any matches in the syntax mapping and then get the actual syntax for ext1?

We could look it up order like this this:

  1. script.ext2 matches an absolute path mapping.
  2. script.ext1 matches an extension syntax.
  3. Use the first-line syntax for the file.
  4. script.ext2 matches an extension syntax.

That should cover cases like a symlinked ~/.ssh/config (1) first, then attempt to look up the syntax based on its perceived extension (2) or content type (3). If none of those are found, I think it would be reasonable to fall back to the canonical file's extension (4).

Note: getting the "absolute path" is actually non-trivial in Rust: https://stackoverflow.com/a/54817755/704831

I was looking at that earlier, actually. It's a shame it's not included in the standard library, but path-clean is a pretty small dependency that won't pull in any additional crates.

sharkdp commented 4 years ago

We could look it up order like this this:

  1. script.ext2 matches an absolute path mapping.
  2. script.ext1 matches an extension syntax.
  3. Use the first-line syntax for the file.
  4. script.ext2 matches an extension syntax.

If the first one would be "script.ext1 matches an absolute path mapping", I would agree. That sounds like a good idea.

sharkdp commented 4 years ago

Note: getting the "absolute path" is actually non-trivial in Rust: https://stackoverflow.com/a/54817755/704831

I was looking at that earlier, actually. It's a shame it's not included in the standard library, but path-clean is a pretty small dependency that won't pull in any additional crates.

In fd, we have been using this code for some time:

https://github.com/sharkdp/fd/blob/0335cc362b2c830c24b957504a7cb1f7cd623a44/src/filesystem.rs#L12-L34

It's been serving us well, but we would need to check if that works for the use case here as well. I'm not sure if all "cleanups" are performed.

eth-p commented 4 years ago

https://github.com/sharkdp/fd/blob/0335cc362b2c830c24b957504a7cb1f7cd623a44/src/filesystem.rs#L12-L34

I adapted it to work in the Rust playground, and it works for everything except parent directories: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2e03fcb23e87920e543cd54d300d207c

"/test" -> "/test"
"test" -> "/some/example/pwd/test"
"./test" -> "/some/example/pwd/test"
"../test" -> "/some/example/pwd/../test"
sharkdp commented 4 years ago

Ok, yeah. That's probably what path-clean is for. I'm ok with adding it as a dependency if it works for Windows paths as well.

eth-p commented 4 years ago

After some testing, it doesn't seem like it handles Windows paths. It would also require a slight amount of boilerplate, too. I'm going to look into finding a crate dedicated to converting relative to absolute paths and get back to you when I find a suitable one.

Potential crates:

sharkdp commented 4 years ago

path-absolutize looks good to me

sumanthratna commented 4 years ago

I'm not sure if this is useful info for you guys, but I think bat highlights correctly for a symlinked gitconfig: .gitconfig -> ~/dotprophet/gitconfig

sharkdp commented 4 years ago

I'm not sure if this is useful info for you guys, but I think bat highlights correctly for a symlinked gitconfig: .gitconfig -> ~/dotprophet/gitconfig

I guess it only works because the original file is also called gitconfig, which is one of the patterns:

❯ bat -L | grep gitconfig
Git Config:gitconfig,.gitconfig,.gitmodules

If you would rename the actual file, it probably won't work anymore.

sharkdp commented 4 years ago

This issue should be fixed in bat v0.15.2. Feedback would be appreciated.

sumanthratna commented 4 years ago

~/.ssh/config -> ~/dotprophet/sshconfig:

I'm on bat 0.15.2.

sharkdp commented 4 years ago

@sumanthratna Thank you.

when I run bat ~/dotprophet/sshconfig, the text is in all-white, with no syntax highlighting.

Well, that is expected. sshconfig is not a pattern that we recognize (only .ssh/config and ssh_config are, see bat -L). If you want that file to be highlighted as well, you would have to rename it to ssh_config or add this to your bat config file:

--map-syntax="sshconfig:SSH Config"
sumanthratna commented 4 years ago

I see, I was misunderstanding something. I had thought that when I run bat ~/dotprophet/sshconfig, the path would get "absolutized" into /Users/suman/.ssh/config, which would be highlighted.

I think this would be a nice feature to have. If this is possible, I'll create a new issue for this feature request.

EDIT: and for the record, I just renamed ~/dotprophet/sshconfig to ~/dotprophet/ssh_config and syntax highlighting works fine.

sharkdp commented 4 years ago

as misunderstanding something. I had thought that when I run bat ~/dotprophet/sshconfig, the path would get "absolutized" into /Users/suman/.ssh/config, which would be highlighted.

That is not possible. You can not reverse-resolve symlinks.

sumanthratna commented 4 years ago

I had thought that when I run bat ~/dotprophet/sshconfig, the path would get "absolutized" into /Users/suman/.ssh/config, which would be highlighted.

That is not possible. You can not reverse-resolve symlinks.

Does this not do what I'm suggesting? https://doc.rust-lang.org/std/fs/fn.read_link.html

sharkdp commented 4 years ago

You said you have a symlink ~/.ssh/config -> ~/dotprophet/sshconfig. This means that ~/.ssh/config is a symlink to ~/dotprophet/sshconfig. You can call read_link on ~/.ssh/config to resolve the symlink "along the arrow" to get the result ~/dotprophet/sshconfig. But there is no way to go backwards.

So if you call bat ~/dotprophet/sshconfig, there is no way that we could know that there is a symlink somewhere else, on this or even on another filesystem, that points to this file. We cannot "resolve symlinks backwards".

sumanthratna commented 4 years ago

Ahhh yikes, you're right, I don't know why I was thinking that. :) Regardless, it looks like the original issue for this ticket has been solved.

Freed-Wu commented 2 years ago

I still met this bug:

❯ \ls Aliases/0install -l
lrwxrwxrwx 1 wzy wzy 26 8月   9 20:48 Aliases/0install -> ../Formula/zero-install.rb

However, bat Aliases/0install will not use ruby's syntax highlight. This example comes from linuxbrew: /home/linuxbrew/.linuxbrew/Homebrew/Library/Taps/homebrew/homebrew-core/Aliases/0install https://github.com/Homebrew/homebrew-core/tree/master/Aliases/0install

keith-hall commented 2 years ago

I'm able to reproduce the homebrew-core example, so, although we have fixed the original specific symlink example in the OP, I'm going to reopen this issue as it is still related.

keith-hall commented 2 years ago

as an experiment, I was able to fix it with this monstrosity (i.e. calling canonicalize), but then it broke the original assets::tests::syntax_detection_for_symlinked_file test, so an expert will probably need to take a look:

diff --git a/src/assets.rs b/src/assets.rs
index f87d393..388c1ab 100644
--- a/src/assets.rs
+++ b/src/assets.rs
@@ -7,7 +7,7 @@ use once_cell::unsync::OnceCell;
 use syntect::highlighting::Theme;
 use syntect::parsing::{SyntaxReference, SyntaxSet};

-use path_abs::PathAbs;
+use path_abs::{PathAbs, PathInfo};

 use crate::error::*;
 use crate::input::{InputReader, OpenedInput};
@@ -271,7 +271,7 @@ impl HighlightingAssets {
         let path = input.path();
         let path_syntax = if let Some(path) = path {
             self.get_syntax_for_path(
-                PathAbs::new(path).map_or_else(|_| path.to_owned(), |p| p.as_path().to_path_buf()),
+                PathAbs::new(path).map_or_else(|_| path.to_owned(), |p| p.canonicalize().map_or_else(|_| p.as_path().to_path_buf(), |c| c.as_path().to_path_buf())),
                 mapping,
             )
         } else {