Open GoogleCodeExporter opened 9 years ago
I have solved this issue: please see my exp branch on github
https://github.com/OpenJPEG/openjpeg/tree/exp
Because I have several commits in the branch, it is easier to see what I did
on github than by looking at a single patch, or multiple patches.
With these fixes, for the use case where a user is compressing multiple files
with the same image characteristics (same height, width, bpp, number of
components) encoding is almost double the speed. For the DCP use case, for
example, this will help a lot.
Note: there may be a few memory leaks, because the trunk code assumes that one
is not re-using the tcd object, for example, so it may not free memory
correctly. But, if there is a leak, it must be minor, because I watched the
memory usage over a long period of time, and it did not grow.
Original comment by boxe...@gmail.com
on 29 Nov 2014 at 4:13
I have combined all of my changes into a single commit: attached is the diff.
Note: I have not run any regression tests, and my own testing involved default
settings for irreversible compression.
Original comment by boxe...@gmail.com
on 30 Nov 2014 at 1:39
Attachments:
@Aaron,
Thanks for the patch. However, there are multiple lines that are removed/added
because of space/tab alignment issues which makes it hard to read. Do you think
you can provide a "cleaner" version ?
Original comment by m.darb...@gmail.com
on 15 Dec 2014 at 7:21
Hi Matthieu,
I've cleaned it up a bit. Let me know if this works for you.
Thanks,
Aaron
Original comment by boxe...@gmail.com
on 16 Dec 2014 at 12:00
Attachments:
By the way, you can find this patch on a branch of my github repo:
https://github.com/OpenJPEG/openjpeg/tree/double_encode_speed
Original comment by boxe...@gmail.com
on 16 Dec 2014 at 12:02
Aaron,
This diff is easier to read.
A few comments though.
It seems that NULL data is still allowed in this patch for all functions (mct,
dwt, t1, ...). I guess it shouldn't (& this is what's causing most of the "hard
to read" patch issues remaining).
The ABI and API are broken with this patch.
I do not feel comfortable patching this as is, maybe others can share there
opinions ?
Nevertheless, I think something shall be done.
I guess the "easier" thing to do without breaking API/ABI is to allow
recycling/reuse of some objects as you propose (like tcd/code-blocks)
I did some profiling on my end and focused on problems you raised. For this I
used opj_compress -i OpenJpeg/data/input/nonregression/ElephantDream_4K.tif -o
ElephantDream_4K.j2k -cinema4K
Running Time Symbol Name
8644.0ms 99.9% main
8174.0ms 94.5% opj_j2k_encode
7915.0ms 91.5% opj_j2k_post_write_tile
259.0ms 2.9% opj_tcd_init_tile
227.0ms 2.6% calloc
6.0ms 0.0% malloc
4.0ms 0.0% _platform_bzero$VARIANT$Merom
1.0ms 0.0% opj_tgt_create
1.0ms 0.0% calloc
359.0ms 4.1% tiftoimage
92.0ms 1.0% opj_j2k_end_compress
91.0ms 1.0% opj_j2k_end_encoding
90.0ms 1.0% opj_tcd_destroy
75.0ms 0.8% opj_tcd_code_block_enc_deallocate
56.0ms 0.6% szone_free_definite_size
13.0ms 0.1% deallocate_pages
5.0ms 0.0% free
1.0ms 0.0% _os_lock_spin_unlock
13.0ms 0.1% deallocate_pages
1.0ms 0.0% szone_free_definite_size
1.0ms 0.0% opj_tgt_destroy
1.0ms 0.0% free_large
1.0ms 0.0% opj_j2k_write_eoc
18.0ms 0.2% opj_destroy_codec
18.0ms 0.2% opj_j2k_destroy
18.0ms 0.2% opj_image_destroy
18.0ms 0.2% free_large
1.0ms 0.0% opj_stream_destroy
calloc in opj_tcd_init_tile (in fact in opj_tcd_code_block_enc_allocate but
inlined or profiler "lost") are taking way too much time. malloc seems the way
to go here if we don't need a "memset" to 0 (I tried & malloc is way faster on
my platform but I did not check non regression).
IMHO, a first patch should deal with malloc instead of calloc if possible +
reuse of tcd/code-blocks. This will allow for API/ABI compatibility with a
first level of optimization.
I guess the amount of time spent in those allocation functions is related to
number of times they're called (inner loop of opj_tcd_init_tile) rather than
overall allocated size so another option might be to allocate a bigger block
that will be split by OpenJPEG (it might not be trivial though).
Original comment by m.darb...@gmail.com
on 16 Dec 2014 at 8:23
Matthieu,
Thanks for reviewing. Did you see a significant speed up on your system
with this patch?
I think I could simplify this further: the code I added to support
encoding without any data could be removed: I was using it for debugging to
isolate the encoding slow-down.
What is left would allow the user to re-cycle the compress objects, rather
than
destroy and re-create for each frame. Do you see an issue with allowing
recycling of compress objects?
Regarding ABI/API changes, I am sure DCP encoder users would be more than
happy to change a little code to get perf improvement of this magnitude.
Original comment by boxe...@gmail.com
on 16 Dec 2014 at 9:46
Here is an even simpler patch.
Original comment by boxe...@gmail.com
on 16 Dec 2014 at 10:04
Attachments:
Aaron,
I previously did not test your patch but instead simply profile what your
pointing out. This is now done.
I'm in no way against recycling objects and I think it shall be working.
Regarding the ABI/API changes, I think that OpenJPEG requires one clean/radical
change rather than many small ones (2.0 => 2.1 is a small change that requires
to maintain two branches for minimal features gain, what you propose could be
seen in the same way requiring 3 branches for little features added). See issue
439 for my opinion which is, in a few words, a 3.0 is needed, designed in a way
that will allow for easier enhancement integration without API/ABI being broken.
That being said, all enhancement that can be done in 2.1 without breaking the
API/ABI shall be merged in quickly (still in my opinion). BTW, as you
mentioned, more than 80% of the time that this patch saves is with tcd reuse
which can be done without API/ABI changes.
There are parts of your patch (Those that don't rely on "owns_data" in
opj_image_comp_t) that could probably merged in.
The tricky part is that it has to work when passing different parameters,
different image sizes & so on. The test framework actually expect 1 image in, 1
compressed image out. It would probably be easy to test with different image
sizes given how opj_compress works. For different parameters, this is another
thing so more work is probably required on opj_compress to allow this (it could
take a list of commands in an input file for example).
I won't merge a patch in if I can't test it. I know there are some parts of
OpenJPEG that aren't tested but I won't take the responsibility of doing such a
thing. Maybe you can ask Matthieu or Antonin if limitations could apply to such
an enhancement (like, recycling is OK as long as all parameters are unchanged,
otherwise crash could happen). The limitation could be enforced by checks (in
opj_setup_encoder for example).
Original comment by m.darb...@gmail.com
on 17 Dec 2014 at 7:23
Hello Again,
First of all, let me say I do appreciate your testing these patches, and your
feedback on this issue.
We can leave it up to the project chefs to decide how to proceed on this one.
Best Regards,
Aaron
Original comment by boxe...@gmail.com
on 17 Dec 2014 at 7:41
Hello Aaron, Matthieu,
Thanks to both of you for having investigated this.
I agree with Matthieu concerning the way we should implement this speed
enhancement.
A first patch should done that does not break API/ABI, with appropriate checks
in order to know if recycling is possible or not. This should be part of
version 2.1.1 that I would like to release during february.
A second patch would allow API/ABI break but would be part of OpenJPEG 3.0
(that will follows suggestions in issue 439).
A question I have about this enhancement: is recycling done at the image level
or at the tile level ? In the former case, is there a way to recycle at the
tile level, which would allow speed optimization for single image encoding with
multi-tiles.
Thanks again for your work.
Original comment by antonin
on 20 Jan 2015 at 11:18
Hi Antonin,
The current patch will handle multi-tile encoding. To get the speed boost,
though, you need a work flow that encodes multiple images with the same image
characteristics: same width, height, bit depth etc. That is why I mentioned DCP
workflow. Single tile or multi-tile doesn't affect things.
Cheers,
Aaron
Original comment by boxe...@gmail.com
on 20 Jan 2015 at 2:08
@Aaron: thanks for the clarification
@Aaron and Matthieu: do you think one of you could prepare a patch that does
not break API/ABI but recycle objects in case of multiple image encoding ?
Original comment by antonin
on 20 Jan 2015 at 5:11
I'm afraid I don't have time to work on this at the moment.
Original comment by boxe...@gmail.com
on 20 Jan 2015 at 7:33
Well, I re-tested this patch on top of the latest trunk commit: unfortunately,
(or fortunately) I am only seeing an 11% speedup now. Still, nice to have, but
not as dramatic as before.
Original comment by boxe...@gmail.com
on 29 Jan 2015 at 7:33
Original issue reported on code.google.com by
boxe...@gmail.com
on 28 Nov 2014 at 8:14