joncampbell123 / dosbox-x

DOSBox-X fork of the DOSBox project
GNU General Public License v2.0
2.53k stars 369 forks source link

Large number of files in dir stalls dosbox-x when entering and use 'dir' command for instance. #5039

Closed bgsjust closed 5 days ago

bgsjust commented 2 weeks ago

Describe the bug

System stalls when accessing and 'dir' folders with large number of files, examble: (200.000 files) not happening in conventional dosbox 0.74-3

Steps to reproduce the behaviour

Prepare a folder in the host system with 200.000 files it can be garbage fake files. Enter dosbox-x map the drive to that path. Navigate to that folder in the command line using 'cd path' and type 'dir'

Expected behavior

it should almost instantaneously list the files.

What operating system(s) this bug have occurred on?

WIndows 11 ver. 23H2

What version(s) of DOSBox-X have this bug?

latest

Used configuration

No response

Output log

No response

Additional information

No response

Have you checked that no similar bug report(s) exist?

Code of Conduct & Contributing Guidelines

rderooy commented 2 weeks ago

Try setting the following config option...

[dos]
hard drive data rate limit=0
joncampbell123 commented 2 weeks ago

If you ask MS-DOS on real 1990s hardware with 1990s hard drives to list 200,000+ files, it would probably take a long time t list them too.

The issue here though is that the emulator remains hung the entire time, while at least real MS-DOS would let you CTRL+C or CTRL+Break out of the directory listing when you're bored watching all those files go by.

joncampbell123 commented 2 weeks ago

This is part of the problem with having commands like DIR as built-in native C++ code and not something that executes partially or entirely within the guest environment. This is why I am trying to make a scripting language that builtin commands and INT 21h/BIOS emulation can use instead of native C++. At least then the script can run as part of CPU execution with some cleaner design.

Torinde commented 2 weeks ago

Related: #3572

bgsjust commented 2 weeks ago

Try setting the following config option...

[dos]
hard drive data rate limit=0

I already have it set to zero, the problem persists

bgsjust commented 2 weeks ago

If you ask MS-DOS on real 1990s hardware with 1990s hard drives to list 200,000+ files, it would probably take a long time t list them too.

The issue here though is that the emulator remains hung the entire time, while at least real MS-DOS would let you CTRL+C or CTRL+Break out of the directory listing when you're bored watching all those files go by.

Yes the hanging is the problem! dosbox 0.74-3 doesn't have this issue!

bgsjust commented 2 weeks ago

Related: #3572

That thread does not present any solution to the problem.

joncampbell123 commented 2 weeks ago

Perhaps a quick solution is to temporarily comment out the I/O delay for directory enumeration.

bgsjust commented 2 weeks ago

Perhaps a quick solution is to temporarily comment out the I/O delay for directory enumeration.

Where can we do this ? Can't find it in the config file.

joncampbell123 commented 2 weeks ago

Hold on, actually, there's no I/O delay applied to directory listing.

However, I see that when "DIR" first begins enumerating files, the file cache gets stuck in a long loop slurping up all files before even returning the first result.

In my test, I created 240,000 small text files on my Linux system.

i=0; while [ $(($i < 240000)) == 1 ]; do j=`printf %06u.TXT $i`; echo HELLO >"$j"; i=$(($i+1)); done

Backtrace while it's "stuck":

(gdb) backtrace
#0  0x00007ff7d44386a2 in __strcmp_avx2 () from /lib64/libc.so.6
#1  0x0000000000620c8b in DOS_Drive_Cache::CreateEntry (this=this@entry=0x7ff7cea26260, dir=0x620c270, name=name@entry=0x7ffe7ff0f670 "165263.TXT", sname=sname@entry=0x7ffe7ff0f663 "", 
is_directory=<optimized out>) at drive_cache.cpp:847
#2  0x0000000000620eb8 in DOS_Drive_Cache::ReadDir (this=this@entry=0x7ff7cea26260, id=<optimized out>, result=@0x7ffe7ff0f8e0: 0x0, lresult=@0x7ffe7ff0f8e8: 0x0) at drive_cache.cpp:894
#3  0x00000000006212b8 in DOS_Drive_Cache::ReadDir (lresult=@0x7ffe7ff0f8e8: 0x0, result=@0x7ffe7ff0f8e0: 0x0, id=<optimized out>, this=0x7ff7cea26260) at drive_cache.cpp:878
#4  DOS_Drive_Cache::FindDirInfo (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", expandedPath=expandedPath@entry=0x7ffe7ff0fd30 "./lotsa") at drive_cache.cpp:762
#5  0x00000000006217e1 in DOS_Drive_Cache::OpenDir (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", id=@0x7ffe7ff10076: 0) at drive_cache.cpp:789
#6  0x0000000000621f40 in DOS_Drive_Cache::FindFirst (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", id=@0x7ffe7ff100ee: 47) at drive_cache.cpp:943
#7  0x0000000000617872 in localDrive::FindFirst (this=0x7ff7cea26010, _dir=0x7ffe7ff10450 "LOTSA", dta=..., fcb_findfirst=<optimized out>) at drive_local.cpp:1730
#8  0x00000000005c036c in DOS_FindFirst (search=search@entry=0x61f2840 "\"C:\\LOTSA\\*.*\"", attr=<optimized out>, attr@entry=65527, fcb_findfirst=fcb_findfirst@entry=false)
at dos_files.cpp:628
#9  0x00000000006e0e3b in doDir (shell=0x5f1e910, args=0x61f2840 "\"C:\\LOTSA\\*.*\"", dta=..., numformat=0x7ffe7ff111b0 "\300\021\361\177\376\177", w_size=1, optW=false, optZ=false, 
optS=false, optP=false, optB=false, optA=false, optAD=false, optAminusD=false, optAS=false, optAminusS=false, optAH=false, optAminusH=false, optAR=false, optAminusR=false, optAA=false, 
optAminusA=false, optOGN=false, optOGD=false, optOGE=false, optOGS=false, optOG=false, optON=false, optOD=false, optOE=false, optOS=false, reverseSort=false, rev2Sort=false)
at shell_cmds.cpp:1621
#10 0x00000000006e41e8 in DOS_Shell::CMD_DIR (this=0x5f1e910, args=0x7ffe7ff114f0 "\"C:\\LOTSA\\*.*\"") at /usr/include/c++/9.3.0/bits/basic_string.h:2300
#11 0x00000000006ca8e4 in DOS_Shell::execute_shell_cmd (this=0x5f1e910, name=0x7ffe7ff11d10 "dir", arguments=0x7ffe7ff13043 "*.*") at shell_cmds.cpp:243
#12 0x00000000006dc927 in DOS_Shell::DoCommand (this=this@entry=0x5f1e910, line=0x7ffe7ff13043 "*.*", line@entry=0x7ffe7ff13040 "dir*.*") at shell_cmds.cpp:295
#13 0x00000000006bb8c9 in DOS_Shell::ParseLine (this=0x5f1e910, line=0x7ffe7ff13040 "dir*.*") at shell.cpp:550
#14 0x00000000006bc562 in DOS_Shell::Run (this=0x5f1e910) at shell.cpp:1072
#15 0x00000000006c6e77 in SHELL_Run () at shell.cpp:1971
#16 0x0000000000780927 in VM_Boot_DOSBox_Kernel () at sdlmain.cpp:7415
#17 0x0000000000736774 in BIOS::cb_bios_boot__func () at bios.cpp:11412
#18 0x0000000000501a36 in Normal_Loop () at dosbox.cpp:472
#19 0x0000000000501cae in DOSBOX_RunMachine () at dosbox.cpp:733
#20 0x00000000004ba901 in main (argc=<optimized out>, argv=0x7ffe7ff14fe8) at sdlmain.cpp:9347
bgsjust commented 2 weeks ago

Hold on, actually, there's no I/O delay applied to directory listing.

However, I see that when "DIR" first begins enumerating files, the file cache gets stuck in a long loop slurping up all files before even returning the first result.

In my test, I created 240,000 small text files on my Linux system.

i=0; while [ $(($i < 240000)) == 1 ]; do j=`printf %06u.TXT $i`; echo HELLO >"$j"; i=$(($i+1)); done

Backtrace while it's "stuck":

(gdb) backtrace
#0  0x00007ff7d44386a2 in __strcmp_avx2 () from /lib64/libc.so.6
#1  0x0000000000620c8b in DOS_Drive_Cache::CreateEntry (this=this@entry=0x7ff7cea26260, dir=0x620c270, name=name@entry=0x7ffe7ff0f670 "165263.TXT", sname=sname@entry=0x7ffe7ff0f663 "", 
is_directory=<optimized out>) at drive_cache.cpp:847
#2  0x0000000000620eb8 in DOS_Drive_Cache::ReadDir (this=this@entry=0x7ff7cea26260, id=<optimized out>, result=@0x7ffe7ff0f8e0: 0x0, lresult=@0x7ffe7ff0f8e8: 0x0) at drive_cache.cpp:894
#3  0x00000000006212b8 in DOS_Drive_Cache::ReadDir (lresult=@0x7ffe7ff0f8e8: 0x0, result=@0x7ffe7ff0f8e0: 0x0, id=<optimized out>, this=0x7ff7cea26260) at drive_cache.cpp:878
#4  DOS_Drive_Cache::FindDirInfo (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", expandedPath=expandedPath@entry=0x7ffe7ff0fd30 "./lotsa") at drive_cache.cpp:762
#5  0x00000000006217e1 in DOS_Drive_Cache::OpenDir (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", id=@0x7ffe7ff10076: 0) at drive_cache.cpp:789
#6  0x0000000000621f40 in DOS_Drive_Cache::FindFirst (this=this@entry=0x7ff7cea26260, path=path@entry=0x7ffe7ff100f0 "./LOTSA/", id=@0x7ffe7ff100ee: 47) at drive_cache.cpp:943
#7  0x0000000000617872 in localDrive::FindFirst (this=0x7ff7cea26010, _dir=0x7ffe7ff10450 "LOTSA", dta=..., fcb_findfirst=<optimized out>) at drive_local.cpp:1730
#8  0x00000000005c036c in DOS_FindFirst (search=search@entry=0x61f2840 "\"C:\\LOTSA\\*.*\"", attr=<optimized out>, attr@entry=65527, fcb_findfirst=fcb_findfirst@entry=false)
at dos_files.cpp:628
#9  0x00000000006e0e3b in doDir (shell=0x5f1e910, args=0x61f2840 "\"C:\\LOTSA\\*.*\"", dta=..., numformat=0x7ffe7ff111b0 "\300\021\361\177\376\177", w_size=1, optW=false, optZ=false, 
optS=false, optP=false, optB=false, optA=false, optAD=false, optAminusD=false, optAS=false, optAminusS=false, optAH=false, optAminusH=false, optAR=false, optAminusR=false, optAA=false, 
optAminusA=false, optOGN=false, optOGD=false, optOGE=false, optOGS=false, optOG=false, optON=false, optOD=false, optOE=false, optOS=false, reverseSort=false, rev2Sort=false)
at shell_cmds.cpp:1621
#10 0x00000000006e41e8 in DOS_Shell::CMD_DIR (this=0x5f1e910, args=0x7ffe7ff114f0 "\"C:\\LOTSA\\*.*\"") at /usr/include/c++/9.3.0/bits/basic_string.h:2300
#11 0x00000000006ca8e4 in DOS_Shell::execute_shell_cmd (this=0x5f1e910, name=0x7ffe7ff11d10 "dir", arguments=0x7ffe7ff13043 "*.*") at shell_cmds.cpp:243
#12 0x00000000006dc927 in DOS_Shell::DoCommand (this=this@entry=0x5f1e910, line=0x7ffe7ff13043 "*.*", line@entry=0x7ffe7ff13040 "dir*.*") at shell_cmds.cpp:295
#13 0x00000000006bb8c9 in DOS_Shell::ParseLine (this=0x5f1e910, line=0x7ffe7ff13040 "dir*.*") at shell.cpp:550
#14 0x00000000006bc562 in DOS_Shell::Run (this=0x5f1e910) at shell.cpp:1072
#15 0x00000000006c6e77 in SHELL_Run () at shell.cpp:1971
#16 0x0000000000780927 in VM_Boot_DOSBox_Kernel () at sdlmain.cpp:7415
#17 0x0000000000736774 in BIOS::cb_bios_boot__func () at bios.cpp:11412
#18 0x0000000000501a36 in Normal_Loop () at dosbox.cpp:472
#19 0x0000000000501cae in DOSBOX_RunMachine () at dosbox.cpp:733
#20 0x00000000004ba901 in main (argc=<optimized out>, argv=0x7ffe7ff14fe8) at sdlmain.cpp:9347

YES!!!! There is a bug there for sure, in DOSBOX 0.74-3 I get immediate response!

joncampbell123 commented 1 week ago

According to the comments in the code it was done so that LFN support works properly. It does an insertion sort with strcmp() to ensure filenames are added in sorted order. When you get to 240,000 files, that insertion takes a LONG time. Both the search, and then the std::vector insert() method to basically memmove() everything over to insert it.

LFN isn't enabled by default. So a quick fix is to do the slow scan and insert if and only if you have LFN support enabled.

The best long term solution would be to change the fileList from a std::vector to a std::map. std::map has a fast key to value lookup and iterating a map naturally reads everything in order. Therefore insertion can just add using fileList[name] = info which is much faster. But to again avoid sorting at all if LFN support is turned off.

joncampbell123 commented 1 week ago

Ah, the LFN support needs the names to be sorted because does a "phone book" midpoint search for the name. Meaning, it has a high and low and checks a midpoint of (high+low)/2 adjusting the endpoints as it searches.

That search would not be necessary if we made fileList a std::map.

joncampbell123 commented 1 week ago

Ah, actually, everything uses GetLongName even for non LFN cases. The quick fix won't work.

joncampbell123 commented 1 week ago

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go.

Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux.

Filesystem APIs aren't as fast on Windows, of course, but it should still help here.

bgsjust commented 1 week ago

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go.

Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux.

Filesystem APIs aren't as fast on Windows, of course, but it should still help here.


Now shall I wait for a new release of dosbox-x ? Where can I download the fix ?

joncampbell123 commented 1 week ago

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go. Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux. Filesystem APIs aren't as fast on Windows, of course, but it should still help here.

Now shall I wait for a new release of dosbox-x ? Where can I download the fix ?

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go. Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux. Filesystem APIs aren't as fast on Windows, of course, but it should still help here.

Now shall I wait for a new release of dosbox-x ? Where can I download the fix ?

There should be a nightly build you can try if you like.

bgsjust commented 1 week ago

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go. Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux. Filesystem APIs aren't as fast on Windows, of course, but it should still help here.

Now shall I wait for a new release of dosbox-x ? Where can I download the fix ?

Got it. At least for FindFirst, skip sorting on insertion until all entries have been added, then sort in one go. Now DOSBox-X pauses for only 1-2 seconds when asked to directory list 240,000 files here on Linux. Filesystem APIs aren't as fast on Windows, of course, but it should still help here.

Now shall I wait for a new release of dosbox-x ? Where can I download the fix ?

There should be a nightly build you can try if you like.


Yes nightly builds!, I tryed it out and the freezing time was reduced to a few seconds in a 200.000 files dir, that's perfectly acceptable! thank you!

Regards!