Open Rouzax opened 4 years ago
Just did an upgrade to Python 3.8 and that already makes a huge difference
Tried with 3.9 but that doesn't fly yet 😄 .
Are you still experiencing an issue or are you satisfied with the speed you are getting from Python 3.8?
Faster is always better 😃 but it is pretty good now
I am guessing you are transferring the files to an SSD correct? You sure you don't have any scripts that is preventing the file transfer from going at max I/O write?
Good point, I forgot 😖 I go from SSD to HDD when storing the episode, so the current throughput is pretty good on Python 3.8 but still nowhere near Windows transfer speed. There is no IO contention.
This is just a Windows file copy from and to the same location as Medusa
Python reaches around 170 MB/s
Hmm there might be something place in the python code in medusa that make sure that the files transfers successfully(like checks) which could potentially cause slower speeds. @p0psicles @medariox Any thoughts on this? Though it does make sense that if you are transferring files from SSD to HDD would cause slower speeds due to RPM rating on the HDD.
That is not possible. An HDD drive with 7200RPM reaches around 115 MB/s best case.
In this case they are 8x8 TB Enterprise grade disks in RAID10 behind a RAID controller with 8GB of cache and doing a file copy from the SSDs to HDDs does give higher throughput. Using something as Robocopy even boosts it further
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : zaterdag 10 oktober 2020 16:54:15
Source : C:\TEMP\Torrent\Downloads\Fargo.S04E01.1080p.WEB.H264-VIDEOHOLE\
Dest : C:\DATA\Private\Fargo\
Files : *.*
Options : *.* /DCOPY:DA /COPY:DAT /J /R:1000000 /W:30
------------------------------------------------------------------------------
New Dir 3 C:\TEMP\Torrent\Downloads\Fargo.S04E01.1080p.WEB.H264-VIDEOHOLE\
100% New File 1.5 g fargo.s04e01.1080p.web.h264-videohole.mkv
100% New File 170 fargo.s04e01.1080p.web.h264-videohole.nfo
100% New File 2346 fargo.s04e01.1080p.web.h264-videohole.srr
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
Dirs : 1 1 0 0 0 0
Files : 3 3 0 0 0 0
Bytes : 1.591 g 1.591 g 0 0 0 0
Times : 0:00:01 0:00:01 0:00:00 0:00:00
Speed : 1129025441 Bytes/sec.
Speed : 64603.353 MegaBytes/min.
Ended : zaterdag 10 oktober 2020 16:54:17
That is roughly 1GB/s
Copying something that is bigger than 8GB will be a bit slower.
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : zaterdag 10 oktober 2020 16:57:48
Source : C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\
Dest : C:\DATA\Private\ISeeYou\
Files : *.*
Options : *.* /DCOPY:DA /COPY:DAT /J /R:1000000 /W:30
------------------------------------------------------------------------------
New Dir 1 C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\
100% New File 26.2 g I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
Dirs : 1 1 0 0 0 0
Files : 1 1 0 0 0 0
Bytes : 26.223 g 26.223 g 0 0 0 0
Times : 0:00:28 0:00:28 0:00:00 0:00:00
Speed : 975407858 Bytes/sec.
Speed : 55813.285 MegaBytes/min.
Ended : zaterdag 10 oktober 2020 16:58:17
But still around 900 MB/s
NOTE: It might appear they are going from and to the same drive but that is not the case, they are just dir symlinks
Then your setup is doing some caching between the HDDs and SSDs, because those speeds can't be reached even with 10k RPM disks.
He is doing caching from what his setup is entailing. Can you provide a speed on a python script?
Then your setup is doing some caching between the HDDs and SSDs, because those speeds can't be reached even with 10k RPM disks.
When you run a server with Enterprise disks, in RAID10 you can. These drives have a sustained right speed of 250MB/s with 8 in a RAID10 you get a 8x read and 4x write speed gain 4 x 250 is 1 GB/s
He is doing caching from what his setup is entailing. Can you provide a speed on a python script?
In the Initial screenshot you see that the move action in Medusa on Python 3.8 reaches around 168 MB/s If you have a test script that I can run happy to do so.
From what you just told me... It looks like how python is just writing to one disk or using the speed of one disk.
From what you just told me... It looks like how python is just writing to one disk or using the speed of one disk.
That would be impossible, for the OS the RAID set is the logical drive, they can not see the individual dives underneath.
import os
import shutil
source = 'current/test/test.py'
target = '/prod/new'
assert not os.path.isabs(source)
target = os.path.join(target, os.path.dirname(source))
# create the folders if not already exists
os.makedirs(target)
# adding exception handling
try:
shutil.copy(source, target)
except IOError as e:
print("Unable to copy file. %s" % e)
except:
print("Unexpected error:", sys.exc_info())
You can try this. I just grabbed this from the internet.
That would be like this?
import os
import shutil
source = 'C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
target = 'C:\DATA\Private\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
assert not os.path.isabs(source)
target = os.path.join(target, os.path.dirname(source))
# create the folders if not already exists
os.makedirs(target)
# adding exception handling
try:
shutil.copy(source, target)
except IOError as e:
print("Unable to copy file. %s" % e)
except:
print("Unexpected error:", sys.exc_info())
That should be it.
That doesn't work
C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>python -V
Python 3.8.6
C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>python test.py
Traceback (most recent call last):
File "test.py", line 7, in <module>
assert not os.path.isabs(source)
AssertionError
C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT>
Ran it like this and it gives me between 170 to 300MB/s but it goes up and down
import shutil
original = r'C:\TEMP\Torrent\Downloads\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
target = r'C:\DATA\Private\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT\I.See.You.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA5.1-iFT.mkv'
shutil.copyfile(original, target)
This could give some insight on why this is happening.
https://stackoverflow.com/questions/26178038/python-slow-read-performance-issue
That seems to be more related to lots of small files.
Python can copy at full IO speed minus a very small overhead. We should be able to fix this.
Also tried with the following test script after creating a 5GB dummy file with
fsutil file createnew dummy.mkv 5368709120
Running Python 3.8.6
import shutil
import timeit
def _copyfileobj_patched(fsrc, fdst, length=16*1024*1024):
"""Patches shutil method to hugely improve copy speed"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
shutil.copyfileobj = _copyfileobj_patched
original = r'C:\TEMP\Torrent\COPYTEST\dummy.mkv'
target = r'C:\DATA\Private\COPYTEST\dummy.mkv'
print('Copy dummy file that is 5GB')
start = timeit.default_timer()
shutil.copyfile(original, target)
stop = timeit.default_timer()
print('Speed MB/s: ', 5120/(stop - start))
The first run is without the "fix" and the second with the fix from Stackoverflow https://stackoverflow.com/a/28584857
C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s: 267.17716848360794
C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s: 312.7792792133464
C:\TEMP\Torrent\COPYTEST>
Does not seem to give much additional speed.
Did same with Python 3.9 with the same script, first without and then with the fix which gives the same result.
C:\TEMP\Torrent\COPYTEST>C:\Python39\python.exe copy.py
Copy dummy file that is 5GB
Speed MB/s: 257.8828776759466
C:\TEMP\Torrent\COPYTEST>C:\Python39\python.exe copy.py
Copy dummy file that is 5GB
Speed MB/s: 287.6408792835612
C:\TEMP\Torrent\COPYTEST>
Also tested with move instead of copy since that is technically what I'm doing with Medusa on Python 3.8.6
import shutil
import timeit
original = r'C:\TEMP\Torrent\COPYTEST\dummy.mkv'
target = r'C:\DATA\Private\COPYTEST\dummy.mkv'
print('Copy dummy file that is 5GB')
start = timeit.default_timer()
shutil.move(original, target)
stop = timeit.default_timer()
print('Speed MB/s: ', 5120/(stop - start))
C:\TEMP\Torrent\COPYTEST>python copy.py
Copy dummy file that is 5GB
Speed MB/s: 249.0267213745948
This last outcome must be faulty. A shutil.move on the same filesystem (=for Windows on the same disk) only changes the File index and will be very much faster then your measurement. This is my speed with a shutil.move.
Copy dummy file that is 5GB Speed MB/s: 4024524.445841845
And this hack of changing the buffer size will hardly do anything for a copy operation,. Only if the buffer size is a substantial part of the file size you will gain a little, but not much because in the end all must be written to disk anyway.
Only if the source and the target are on different disks and you would change the copy operation to multi threaded (read and write in different threads) you could gain speed.
Take a look at this: https://gist.github.com/zapalote/30aa2d7b432a08e6a7d95e536e672494
Mine is not on the same disk, there is a symbolic link in that path. The source is SSD raid group and the destination is HDD group
If the disk are different the shutil.move uses shutil.copy.
Is there any progress on making the file handling faster? At the moment Medusa is still capping out around 100MB/s which is around 10% of my max throughput. I know this is probably not high on the prio list 😃 but it would be a nice to have for me
You could experiment by writing a small Python script that does a copy for you. Like mentioned before you should use shutil.copy
@p0psicles, you mean to test the throughput of Python?
Yes. That way you can test if this is related to the shutil lib. But even then. Shutil just uses the file copy commands available by your os
Already did that here: https://github.com/pymedusa/Medusa/issues/8579#issuecomment-706673412 it seems to be inherent to Python
Yeah I don't know what you want us to do? We vant really change python? You could try upgrading to python 3.10?
Already on 3.11 I understand that you can't change Python in and of itself, but perhaps there are optimized file system controls available.
If it is what it is, we can close this issue.
Describe the bug Running Medusa on Windows and during the import it is really slow on the file move.
Running a file copy from Windows explorer will give me around 700MB/s throughput and using an util like Robocopy it will even hit 1.3 GB/s
Could this related to: https://stackoverflow.com/questions/21799210/python-copy-larger-file-too-slow https://bugs.python.org/issue33671