sharkdp / bat

A cat(1) clone with wings.
Apache License 2.0
49.13k stars 1.24k forks source link

Detect binary file takes too long #2262

Open balroggg opened 2 years ago

balroggg commented 2 years ago

What steps will reproduce the bug?

  1. Run bat file.ova

What happens?

Take too long to output something

What did you expect to happen instead?

Print that it is a binary file

How did you install bat?

pacman, Archlinux

bat version and environment

bat --diagnostic

Software version

bat 0.21.0 (405e5f74)

Operating system

Linux 5.18.16-arch1-1

Command-line

bat --diagnostic 

Environment variables

SHELL=/bin/bash
PAGER=<not set>
LESS=<not set>
LANG=en_US.UTF-8
LC_ALL=<not set>
BAT_PAGER=<not set>
BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=<not set>
XDG_CONFIG_HOME=<not set>
XDG_CACHE_HOME=<not set>
COLORTERM=truecolor
NO_COLOR=<not set>
MANPAGER=<not set>

Config file

# This is `bat`s configuration file. Each line either contains a comment or
# a command-line option that you want to pass to `bat` by default. You can
# run `bat --help` to get a list of all possible configuration options.

# Specify desired highlighting theme (e.g. "TwoDark"). Run `bat --list-themes`
# for a list of all available themes
#--theme="OneHalfLight"

# Enable this to use italic text on the terminal. This is not supported on all
# terminal emulators (like tmux, by default):
#--italic-text=always

# Uncomment the following line to disable automatic paging:
#--paging=never

# Uncomment the following line if you are using less version >= 551 and want to
# enable mouse scrolling support in `bat` when running inside tmux. This might
# disable text selection, unless you press shift.
#--pager="--RAW-CONTROL-CHARS --quit-if-one-screen --mouse"

# Syntax mappings: map a certain filename pattern to a language.
#   Example 1: use the C++ syntax for .ino files
#   Example 2: Use ".gitignore"-style highlighting for ".ignore" files
#--map-syntax "*.ino:C++"
#--map-syntax ".ignore:Git Ignore"

Custom assets metadata

Could not read contents of '/home/balrog/.cache/bat/metadata.yaml': No such file or directory (os error 2).

Custom assets

'/home/balrog/.cache/bat' not found

Compile time information

Less version

> less --version 
less 590 (PCRE2 regular expressions)
Copyright (C) 1984-2021  Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: https://greenwoodsoftware.com/less
balroggg commented 2 years ago

Bat

 time bat file.ova 
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: file.ova   <BINARY>
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

real    2m52,606s
user    0m12,487s
sys 0m23,694s

Inspect from content_inspector

inspect file.ova`
file.ova: binary

real    0m0,213s
user    0m0,083s
sys 0m0,022s

file --mime

time file --mime file.ova
file.ova: application/x-tar; charset=binary

real    0m0,042s
user    0m0,000s
sys 0m0,007s
balroggg commented 2 years ago
hexyl --border none -n 32 file.ova
 00000000  73 69 65 73 2d 6d 63 2e   6f 76 66 00 00 00 00 00  sies-mc. ovf00000 
 00000010  00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00  00000000 00000000 
 00000020 
sharkdp commented 2 years ago

Thank you for the detailed bug report. The reason is probably that bat (in order to apply the content_inspector heuristic of detecting binary files) reads the full first line instead of just the first N bytes (like inspect). If there is no \n character in the binary file, that can take a long time.

In this sense, this is related to #304

Enselic commented 1 year ago

Marking as "help wanted" because there is a PR that just needs some debugging: https://github.com/sharkdp/bat/pull/2369