pymedusa / Medusa

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic.
https://pymedusa.com
GNU General Public License v3.0
1.8k stars 276 forks source link

Slow file copy on process #8579

Open Rouzax opened 4 years ago

Rouzax commented 4 years ago

Describe the bug Running Medusa on Windows and during the import it is really slow on the file move. image

Running a file copy from Windows explorer will give me around 700MB/s throughput and using an util like Robocopy it will even hit 1.3 GB/s

Could this related to: https://stackoverflow.com/questions/21799210/python-copy-larger-file-too-slow https://bugs.python.org/issue33671

Medusa Info: Branch: master Commit: b352bb6924afcdfafce176a540d53ce405ca1312 Version: 0.4.3 Database: 44.16
Python Version: 3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]
SSL Version: OpenSSL 1.1.1d 10 Sep 2019
OS: Windows-10-10.0.17763-SP0
Locale: nl_NL.cp1252
Rouzax commented 4 years ago

Just did an upgrade to Python 3.8 and that already makes a huge difference image

Tried with 3.9 but that doesn't fly yet 😄 .

Entixs commented 4 years ago

Are you still experiencing an issue or are you satisfied with the speed you are getting from Python 3.8?

Rouzax commented 4 years ago

Faster is always better 😃 but it is pretty good now

Entixs commented 4 years ago

I am guessing you are transferring the files to an SSD correct? You sure you don't have any scripts that is preventing the file transfer from going at max I/O write?

Rouzax commented 4 years ago

Good point, I forgot 😖 I go from SSD to HDD when storing the episode, so the current throughput is pretty good on Python 3.8 but still nowhere near Windows transfer speed. There is no IO contention.

This is just a Windows file copy from and to the same location as Medusa image

Python reaches around 170 MB/s

Entixs commented 4 years ago

Hmm there might be something place in the python code in medusa that make sure that the files transfers successfully(like checks) which could potentially cause slower speeds. @p0psicles @medariox Any thoughts on this? Though it does make sense that if you are transferring files from SSD to HDD would cause slower speeds due to RPM rating on the HDD.

medariox commented 4 years ago

That is not possible. An HDD drive with 7200RPM reaches around 115 MB/s best case.

Rouzax commented 4 years ago

In this case they are 8x8 TB Enterprise grade disks in RAID10 behind a RAID controller with 8GB of cache and doing a file copy from the SSDs to HDDs does give higher throughput. Using something as Robocopy even boosts it further

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : zaterdag 10 oktober 2020 16:54:15
   Source : C:\TEMP\Torrent\Downloads\Fargo.S04E01.1080p.WEB.H264-VIDEOHOLE\
     Dest : C:\DATA\Private\Fargo\

    Files : *.*

  Options : *.* /DCOPY:DA /COPY:DAT /J /R:1000000 /W:30

------------------------------------------------------------------------------

          New Dir          3    C:\TEMP\Torrent\Downloads\Fargo.S04E01.1080p.WEB.H264-VIDEOHOLE\
100%        New File               1.5 g        fargo.s04e01.1080p.web.h264-videohole.mkv
100%        New File                 170        fargo.s04e01.1080p.web.h264-videohole.nfo
100%        New File                2346        fargo.s04e01.1080p.web.h264-videohole.srr

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         1         0         0         0         0
   Files :         3         3         0         0         0         0
   Bytes :   1.591 g   1.591 g         0         0         0         0
   Times :   0:00:01   0:00:01                       0:00:00   0:00:00

   Speed :           1129025441 Bytes/sec.
   Speed :            64603.353 MegaBytes/min.
   Ended : zaterdag 10 oktober 2020 16:54:17

That is roughly 1GB/s

Copying something that is bigger than 8GB will be a bit slower.

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : zaterdag 10 oktober 2020 16:57:48
   Source : C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\
     Dest : C:\DATA\Private\ISeeYou\

    Files : *.*

  Options : *.* /DCOPY:DA /COPY:DAT /J /R:1000000 /W:30

------------------------------------------------------------------------------

          New Dir          1    C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\
100%        New File              26.2 g        I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         1         0         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :  26.223 g  26.223 g         0         0         0         0
   Times :   0:00:28   0:00:28                       0:00:00   0:00:00

   Speed :           975407858 Bytes/sec.
   Speed :           55813.285 MegaBytes/min.
   Ended : zaterdag 10 oktober 2020 16:58:17

But still around 900 MB/s

NOTE: It might appear they are going from and to the same drive but that is not the case, they are just dir symlinks

medariox commented 4 years ago

Then your setup is doing some caching between the HDDs and SSDs, because those speeds can't be reached even with 10k RPM disks.

Entixs commented 4 years ago

He is doing caching from what his setup is entailing. Can you provide a speed on a python script?

Rouzax commented 4 years ago

Then your setup is doing some caching between the HDDs and SSDs, because those speeds can't be reached even with 10k RPM disks.

When you run a server with Enterprise disks, in RAID10 you can. These drives have a sustained right speed of 250MB/s with 8 in a RAID10 you get a 8x read and 4x write speed gain 4 x 250 is 1 GB/s image

He is doing caching from what his setup is entailing. Can you provide a speed on a python script?

In the Initial screenshot you see that the move action in Medusa on Python 3.8 reaches around 168 MB/s If you have a test script that I can run happy to do so.

Entixs commented 4 years ago

From what you just told me... It looks like how python is just writing to one disk or using the speed of one disk.

Rouzax commented 4 years ago

From what you just told me... It looks like how python is just writing to one disk or using the speed of one disk.

That would be impossible, for the OS the RAID set is the logical drive, they can not see the individual dives underneath.

Entixs commented 4 years ago
import os
import shutil

source = 'current/test/test.py'
target = '/prod/new'

assert not os.path.isabs(source)
target = os.path.join(target, os.path.dirname(source))

# create the folders if not already exists
os.makedirs(target)

# adding exception handling
try:
    shutil.copy(source, target)
except IOError as e:
    print("Unable to copy file. %s" % e)
except:
    print("Unexpected error:", sys.exc_info())

You can try this. I just grabbed this from the internet.

Rouzax commented 4 years ago

That would be like this?

import os
import shutil

source = 'C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
target = 'C:\DATA\Private\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'

assert not os.path.isabs(source)
target = os.path.join(target, os.path.dirname(source))

# create the folders if not already exists
os.makedirs(target)

# adding exception handling
try:
    shutil.copy(source, target)
except IOError as e:
    print("Unable to copy file. %s" % e)
except:
    print("Unexpected error:", sys.exc_info())
Entixs commented 4 years ago

That should be it.

Rouzax commented 4 years ago

That doesn't work

C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>python -V
Python 3.8.6

C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>python test.py
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    assert not os.path.isabs(source)
AssertionError

C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>

Ran it like this and it gives me between 170 to 300MB/s but it goes up and down

import shutil

original = r'C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
target = r'C:\DATA\Private\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'

shutil.copyfile(original, target)

image

Entixs commented 4 years ago

This could give some insight on why this is happening.

https://stackoverflow.com/questions/26178038/python-slow-read-performance-issue

Rouzax commented 4 years ago

That seems to be more related to lots of small files.

OmgImAlexis commented 4 years ago

Python can copy at full IO speed minus a very small overhead. We should be able to fix this.

Rouzax commented 4 years ago

Also tried with the following test script after creating a 5GB dummy file with fsutil file createnew dummy.mkv 5368709120

Running Python 3.8.6

import shutil
import timeit

def _copyfileobj_patched(fsrc, fdst, length=16*1024*1024):
    """Patches shutil method to hugely improve copy speed"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
shutil.copyfileobj = _copyfileobj_patched

original = r'C:\TEMP\Torrent\COPYTEST\dummy.mkv'
target = r'C:\DATA\Private\COPYTEST\dummy.mkv'

print('Copy dummy file that is 5GB')

start = timeit.default_timer()
shutil.copyfile(original, target)
stop = timeit.default_timer()

print('Speed MB/s: ', 5120/(stop - start))  

The first run is without the "fix" and the second with the fix from Stackoverflow https://stackoverflow.com/a/28584857

C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s:  267.17716848360794

C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s:  312.7792792133464

C:\TEMP\Torrent\COPYTEST>

Does not seem to give much additional speed.

Did same with Python 3.9 with the same script, first without and then with the fix which gives the same result.

C:\TEMP\Torrent\COPYTEST>C:\Python39\python.exe copy.py
Copy dummy file that is 5GB
Speed MB/s:  257.8828776759466

C:\TEMP\Torrent\COPYTEST>C:\Python39\python.exe copy.py
Copy dummy file that is 5GB
Speed MB/s:  287.6408792835612

C:\TEMP\Torrent\COPYTEST>

Also tested with move instead of copy since that is technically what I'm doing with Medusa on Python 3.8.6

import shutil
import timeit

original = r'C:\TEMP\Torrent\COPYTEST\dummy.mkv'
target = r'C:\DATA\Private\COPYTEST\dummy.mkv'

print('Copy dummy file that is 5GB')

start = timeit.default_timer()
shutil.move(original, target)
stop = timeit.default_timer()

print('Speed MB/s: ', 5120/(stop - start)) 
C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s:  249.0267213745948
BenjV commented 4 years ago

This last outcome must be faulty. A shutil.move on the same filesystem (=for Windows on the same disk) only changes the File index and will be very much faster then your measurement. This is my speed with a shutil.move.

Copy dummy file that is 5GB Speed MB/s: 4024524.445841845

And this hack of changing the buffer size will hardly do anything for a copy operation,. Only if the buffer size is a substantial part of the file size you will gain a little, but not much because in the end all must be written to disk anyway.

Only if the source and the target are on different disks and you would change the copy operation to multi threaded (read and write in different threads) you could gain speed.

Take a look at this: https://gist.github.com/zapalote/30aa2d7b432a08e6a7d95e536e672494

Rouzax commented 4 years ago

Mine is not on the same disk, there is a symbolic link in that path. The source is SSD raid group and the destination is HDD group

BenjV commented 4 years ago

If the disk are different the shutil.move uses shutil.copy.

Rouzax commented 1 year ago

Is there any progress on making the file handling faster? At the moment Medusa is still capping out around 100MB/s which is around 10% of my max throughput. I know this is probably not high on the prio list 😃 but it would be a nice to have for me

p0psicles commented 1 year ago

You could experiment by writing a small Python script that does a copy for you. Like mentioned before you should use shutil.copy

Rouzax commented 1 year ago

@p0psicles, you mean to test the throughput of Python?

p0psicles commented 1 year ago

Yes. That way you can test if this is related to the shutil lib. But even then. Shutil just uses the file copy commands available by your os

Rouzax commented 1 year ago

Already did that here: https://github.com/pymedusa/Medusa/issues/8579#issuecomment-706673412 it seems to be inherent to Python

p0psicles commented 1 year ago

Yeah I don't know what you want us to do? We vant really change python? You could try upgrading to python 3.10?

Rouzax commented 1 year ago

Already on 3.11 I understand that you can't change Python in and of itself, but perhaps there are optimized file system controls available.

If it is what it is, we can close this issue.