Open typesupply opened 5 years ago
@anthrotype, I'm going to do my studying here and if it does require a change in ufoLib, I'll make an issue for discussion and proposal over there. I figured it'd be best to consolidate it here for now since defcon is where the problem is most acute as of this moment.
sounds good thanks!
This is the current read process inside of UFOReader:
Inside of defcon.Font
a ufoLib.UFOReader
object is created during __init__
. UFOReader
retains a references to a ZipFS
object for all contents of the UFOZ. Also __init__
the glyph sets for each layer are obtained for the reader and retained in defcon's Layer
objects. (This is required for lazy loading of glyphs in defcon.) These glyph sets retain references to sub-directories of the parent ZipFS
object. (This is required for lazy loading of glyphs by the ufoLib.glifLib.GlyphSet
objects.) Once __init__
is complete, the UFOReader
object is no longer retained.
When a file that was not read during __init__
is needed by a defcon caller, a UFOReader
is created again. Most top level files are not read during __init__
. This redundancy costs time. It's not a huge amount of time for an individual file read, but if multiple files are loaded in one sequence, the time can add up.
This is the current write process inside of UFOWriter:
This is a very expensive process. In my test case of one style of Source Serif Pro (1415 glyphs) the write time is always a minimum of around two seconds. That's just the overhead of the write process as it is currently structured. I've analyzed this and it breaks down like this:
steps 1-3: 0.818 seconds
step 4: 0.0003 seconds
steps 5-7: 0.995 seconds
The obvious solution to this is to retain UFOReader
and UFOWriter
outside of __init__
and save
. However, zips are tricky little files so it's not quite so simple. First, when writing, the ZipFS
object must closed to trigger the compression and writing of the actual file. Once the ZipFS
is closed it can't be used again. So, retaining ZipFS
doesn't do any good.
Further complicating things is the possibility of external changes to the UFOZ by another application/process. When reading from ZipFS
the data inside the zip is immediately pulled into memory. If the data inside the original zip is changed, the data that was pulled into memory will not reflect the change.
My current idea is to modify the write process so that part of the work can be reused. The logic would be like this:
1. if we don't have a copy of the contents of the zip in a temporary file system, make one and set it to be retained
2. write changes to the files in the temporary file system
3. create new zip container
4. copy all files from temporary directory into zip container
5. write zip container to disk
Using the timed results from above, during the first save, the time should break down like this:
step 1: 0.818 seconds
step 2: 0.0003 seconds
steps 3-5: 0.995 seconds
And subsequent saves should break down like this:
step 1: very little time
step 2: 0.0003 seconds
steps 3-5: 0.995 seconds
This will reduce the overhead on subsequent saves by half. If an external change is made, we need to discard the retained temporary file system redo step 1 as the temporary file system is no longer in sync.
I think there may be more optimizations possible, but this is what I have for now. I haven't yet worked out how this logic would be distributed to defcon and ufoLib. I'm going to do more thinking and testing before I get into that.
Below is an outline of what I am going to try first. If this works, there will be no changes needed in ufoLib. All of these changes would be applicable only to defcon's Font
class.
First, I'd establish a defcon specific version of ZipFS. This will be used for reading only:
from fs.zipfs import ReadZipFS
class DefconZipFS(ReadZipFS):
def __init__(self, *args, writableFS=None, **kwargs):
self.writableFS = writableFS
super(DefconZipFS, self).__init__(*args, **kwargs)
def getbytes(self, path):
if self.writableFS.exists(path):
return self.writableFS.getbytes(path)
data = super(DefconZipFS, self).getbytes(path)
self.writableFS.setbytes(path, data)
return data
In Font.__init__
:
- make a write enabled instance of `ZipFS` (let's call this "internalFS")
- retain it
- make an instance of `DefconZipFS` (let's call this "sourceFS") pointed at the data on disk
- retain it
When creating a UFOReader
:
- provide sourceFS as the path argument
This will produce the following behavior when anything is read from the UFOReader
:
- if the file is in internalFS
- read from there
- return the data
- else
- if sourceFS is still available
- read from there
- write the data into internalFS
- return the data
- else (this won't happen unless a file that was never in sourceFS is requested)
- error
In Font.save
:
- if sourceFS is still around
- copy unread files to internalFS
- release sourceFS
- make the changes to internalFS
- make a copy of internalFS (using BytesIO as the file)
- write the zip data from the copy
- write the zip data to disk
This behavior should be off by default and may be activated via a kwarg in __init__
.
There are situations that need to be handled that I don't have answers for yet:
fs
objects released?
Reading and writing UFOZ is extremely slow when only part of the wrapped UFO data have been modified. This is partly due to some issues in ufoLib (which I will address separately) but the changes that I am studying for ufoLib will need to be deeply integrated within defcon.
This issue is a public notepad for my examinations and tests. I'll be updating the tasks and results in the comments of this issue.