libnxz / power-gzip

POWER NX zlib compliant library
23 stars 18 forks source link

Input buffering optimization not working as expected on inflate #84

Closed mscastanho closed 3 years ago

mscastanho commented 3 years ago

AFAIU, fifo_in on nx_inflate is supposed to allow an optimization in which if the user provides too little data (less than soft_copy_threshold), the data can be buffered on fifo_in and we can avoid the overhead of calling the NX engine, so inflate can return quickly to ask for more data.

However, this does not seem to be happening right now. If the amount of availabe input data is less than the threshold, it's being copied to fifo_in alright, but the engine call is never being skipped, so we are actually just adding overhead by calling memcpy to copy data from the user's input buffer to fifo_in.

This becomes evident on run_case2 on test_inflate. The test feeds libnxz with 1 byte of input at a time, so we should see fifo_in buffering the input until soft_copy_threshold is reached, with no calls to the NX engine before this point. But checking the logs for that test, looks like the engine is being called every time. I used this diff to generate the output below:

diff --git a/lib/gzip_vas.c b/lib/gzip_vas.c
index f913707..72a73d9 100644
--- a/lib/gzip_vas.c
+++ b/lib/gzip_vas.c
@@ -243,6 +243,8 @@ int nxu_run_job(nx_gzip_crb_cpb_t *cmdp, void *handle)

        assert(handle != NULL);

+        prt_info("%s:%d [%s] Issuing NX job!\n", __FILE__, __LINE__, __FUNCTION__);
+
        while (1) {

                hwsync();
diff --git a/lib/nx_inflate.c b/lib/nx_inflate.c
index d550da2..66efeb7 100644
--- a/lib/nx_inflate.c
+++ b/lib/nx_inflate.c
@@ -895,6 +895,9 @@ static int nx_inflate_(nx_streamp s, int flush)
        /* nx hardware */
        uint32_t sfbt = 0, subc = 0, spbc, tpbc, nx_ce, fc;

+       static int fgpt=0;
+       fgpt++;
+
        nx_gzip_crb_cpb_t *cmdp = s->nxcmdp;
        nx_dde_t *ddl_in = s->ddl_in;
        nx_dde_t *ddl_out = s->ddl_out;
@@ -977,6 +980,7 @@ copy_fifo_out_to_next_out:
                                return Z_MEM_ERROR;
                        }
                }
+               prt_info("%s:%d : below the soft_copy_threshold! (call #%d)\n",__FILE__,__LINE__,fgpt);
                /* reset fifo head to reduce unnecessary wrap arounds */
                s->cur_in = (s->used_in == 0) ? 0 : s->cur_in;
                fifo_in_len_check(s);
@@ -1165,6 +1169,7 @@ restart_nx:
         */
        cc = nx_submit_job(ddl_in, ddl_out, cmdp, s->nxdevp);

+       prt_info("%s:%d NX engine has been called! (call #%d)\n",__FILE__,__LINE__,fgpt);
        switch (cc) {

        case ERR_NX_AT_FAULT:

Part of nx.log after running: NX_GZIP_LOGFILE=$(pwd)/nx.log NX_GZIP_VERBOSE=2 LD_LIBRARY_PATH=./lib/ ./test/test_inflate

[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:913 avail_in 1 total_in 2 used_in 0 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 0 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: nx_alloc_buffer: len 2048 alignment 65536
[2021/10/13 10:02:04] pid 28570: Info: nx_alloc_buffer: alloc 0x180b0000 len 2048
[2021/10/13 10:02:04] pid 28570: Info: nx_inflate.c:983 : below the soft_copy_threshold! (call #2)
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:997 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: len_next_out 1 len_out 309600 cur_out 32768 used_out 0 source_sz 1 history_len 0
[2021/10/13 10:02:04] pid 28570: Info: target_sz_expected 40961 source_sz_expected 40962 source_sz 1 last_comp_ratio 1000 nx_history_len 0
[2021/10/13 10:02:04] pid 28570: Info: gzip_vas.c:246 [nxu_run_job] Issuing NX job!
[2021/10/13 10:02:04] pid 28570: Info: nx_inflate.c:1172 NX engine has been called! (call #2)
[2021/10/13 10:02:04] pid 28570: Info: cc3: sfbt f subc 8
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1311 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: source_sz 0 partial_bits 8 in_histlen 0 in_subc 0 in_sfbt 0 in_rembytecnt 0
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1427 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == 1428 source_sz 0 used_in 1 cur_in 0
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1442 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == 1443 source_sz 0 used_in 1 cur_in 0
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1464 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1503 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == 1507 source_sz 0 tpbc 0 last_comp_ratio 1000
[2021/10/13 10:02:04] pid 28570: Info: == 1521 source_sz 0 tpbc 0 last_comp_ratio 1000
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1528 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1556 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: == 1557 flush 0 is_final 0 last_comp_ratio 1000
[2021/10/13 10:02:04] pid 28570: Info: == nx_inflate_:1567 avail_in 0 total_in 3 used_in 1 cur_in 0 avail_out 1 total_out 0 used_out 0 cur_out 32768 len_in 2048 len_out 309600 flush 0
[2021/10/13 10:02:04] pid 28570: Info: 697: inf_state 17

Note that in the beginning avail_in == 1 && avail_out > 0, which ends up triggering the copy to fifo_in (see that used_in becomes 1). And note that the engine is called nonetheless (note that call #N is the same as for when we reach the threshold check, so that's still the same nx_inflate_ call).

This same pattern is followed for subsequent bytes. The user keeps providing 1 byte of input with each call, and the NX engine is called for every single one of them, instead of buffering them for later for some batch processing.

abalib commented 3 years ago

That's a good find. It appears we either forgot to put a return or a goto somewhere around that check. At the same time, I don't see any test on the flush parameter. If flush is set, no matter the size of input the inflate operation must be executed. I am wondering soft_copy_threshold=0 with lot of testing is the right way to go until then.

tuliom commented 3 years ago

Fixed.