linuxmint / nemo

File browser for Cinnamon
GNU General Public License v2.0
1.23k stars 299 forks source link

Option to deactivate file header detection #3020

Open HT-7 opened 2 years ago

HT-7 commented 2 years ago

Split from #1907.


Nemo has a usually helpful feature which detects the types of files with unknown or missing extensions by reading their header, so their type can be listed in the "type" column, and their thumbnail reflects the file type. Each time a directory is opened or a search is performed, Nemo reads the beginning of all listed files with an unknown type to find out their type.

This was immensely helpful when browsing the .chk files in a found.000 folder which was generated by the CHKDSK utility on a Windows machine.

However, since this causes many random (non-sequential) read requests, it can add lag and deteriorate performance, especially on non-flash media such as hard disks, where random reads cause significant latency. Seeing the name of files is enough for some file management tasks.

The file type detection especially adds lag and annoying lens move sounds on optical media, making it inconvenient to browse optical disc archives. It even adds some lag on flash storage on highly populated directories and long search listings. There is a big difference between only listing files, which usually takes reading few sequential blocks of data, and both reading the directory listing and also the first sector of every listed file with an unknown type, which needs many additional non-sequential read requests. And without the noatime mounting option, it causes additional writes since the last access times need to be updated for each checked file.

As brilliant as the file header detection feature is, it can occasionally be more detrimental than helpful, as can be seen in #1907, where many people report performance issues. Just like the option to turn counting files within directories ("Count number of items") off, there should be an option for that.

So will an option for turning it off be implemented, or do I need to rely on a different file manager for handling highly populated directories?

Jeremy7701 commented 2 years ago

Optical disk archives do not allow the writing of any information, because they are archives.

BTW atime doesn't require a file to be read or written. relatime is the default. From the man page:-

       noatime
              Do  not update inode access times on this filesystem (e.g. for faster access on the news spool to speed
              up news servers).  This works for all inode types (directories too), so it implies nodiratime.

--
       relatime
              Update inode access times relative to modify or change time.  Access time is only updated if the previ‐
              ous  access  time  was  earlier  than  the  current modify or change time.  (Similar to noatime, but it
              doesn't break mutt or other applications that need to know if a file has been read since the last  time
              it was modified.)
HT-7 commented 2 years ago

Optical disk archives do not allow the writing of any information, because they are archives.

When I mentioned the writing, I was referring to ordinary mass storage (HDD and flash drives), not optical discs. On optical media, I was only referring to reading. Mass-storage-like file management (also known as "live file system") might be possible on optical media through udftools, but I have not tried it yet.

My point was that because optical drives have a significant random access latency, their performance deteriorates most from the file header checking.

BTW atime doesn't require a file to be read or written.

I know, reading that attribute does not change it and does not require the content of the file to be accessed. But if Nemo accesses the beginning of each file, it is recognized by the file system driver as an access.

mcatkins commented 7 months ago

Can I add that "file type" detection and "number of contents" detection both contribute to nemo being barely usable over high latency links.

I would like to see settings for when these are turned off, based on filetype, measured latency, etc

Even "is it a directory" detection is a big overhead, in the situations when one is traversing down a hierachy containing some large directories. It would be great to be able to open a filename, even if it was not yet known if it was a directory. Of course the first thing to do would be to find out, but that could be prioritized above the metadata collection for all the other names.

Just to add that I agree that all these features are amazingly useful the rest of the time!

Jeremy7701 commented 7 months ago

It's not possible to traverse a file system if you don't know which entries are directories or are not directories. You also need to know user/group ownership and additionally file permissions, since you need to know if you are permitted to traverse a directory hierarchy.

mcatkins commented 7 months ago

This is true-ish, but not relevant to what I suggested. It is also emphatically not what the shell does when I manually navigate the file system: when I type cd x the shell does not know if "x" is a directory or not, it lets cd "discover" that when it attempts the chdir.

Currently, in nemo: When a new directory is opened, nemo does the following (I'm assuming some bits...)

  1. reads the directory, which gets a list of filenames
  2. for each filename in the directory, does some variant of stat(2) on the name This tells nemo which names are directories, which files, etc, permissions, times, etc, etc
  3. Displays the file list to the user
  4. While the user works: In the background: for each directory in the list, read it, to discover how many items it contains.

I was suggesting this could become:

  1. read the directory
  2. display the raw file names to the user (main change)
  3. In parallel with user activity: stat each file (main change)
  4. update the file list seen by the user (if this is not done as each stat result is obtained)
  5. While the user works: for each directory, read it, to discover how many items it contains.

Now during step 3, the user would be able to (double-click) open a filename At that point, the file's stat information might, or might not be available, if it isn't, get it Now if the filename refers to a directory, the background reading of stat's can be stopped, and this filename can be opened as a directory by going back to step 1 (permissions allowing) If the filename is not a directory, whatever steps needed to open the file can be taken, XDG, etc The background stat's should continue, since the file list will still be visible to the user.

Of course, whether this complexity is "worth it" is open for debate. It does seem to be a sticking plaster, on the other hand latency isn't going away (any time before the sun explodes!)

Regards, Martin

On Mon, 8 Apr 2024 at 15:51, Jeremy7701 @.***> wrote:

It's not possible to traverse a file system if you don't know which entries are directories or are not directories. You also need to know user/group ownership and additionally file permissions, since you need to know if you are permitted to traverse a directory hierarchy.

— Reply to this email directly, view it on GitHub https://github.com/linuxmint/nemo/issues/3020#issuecomment-2042383753, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSVVIBWRJHYPFOTRWH6Z4TY4JVT3AVCNFSM53N7OES2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGIZTQMZXGUZQ . You are receiving this because you commented.Message ID: @.***>