libnxz / power-gzip

POWER NX zlib compliant library
23 stars 18 forks source link

inflate: Limit the amount of pages touched #150

Closed tuliom closed 2 years ago

tuliom commented 2 years ago

Before inflating a file, all the input and output pages are touched. However, when this file is too large, there is no need to touch all its pages, just as many as we can fit in a single NX job.

By not doing this, we end up dealing with too many page faults than necessary for each NX request.

tuliom commented 2 years ago

I'm leaving this as draft for now, because I haven't tested it fully yet. I'm asking myself if deflate should be modified as well.

abalib commented 2 years ago

I already mentioned this to Tulio. But as a reminder to self and for record keeping. Examine the variables target_sz vs target_sz_expected about these 3 lines.
https://github.com/libnxz/power-gzip/blob/master/lib/nx_inflate.c#L1180 https://github.com/libnxz/power-gzip/blob/master/lib/nx_inflate.c#L1199 https://github.com/libnxz/power-gzip/blob/master/lib/nx_inflate.c#L1223 Line 1093 is touching unlimited number of pages which would may lead to the page faulting issue. It seems it is better to touch target_sz_expected pages than target_sz.
Because in line 1062 the code is limiting the output to inflate_per_job_len.

tuliom commented 2 years ago

Let me clarify: I'm rewriting this PR and I'm changing exactly how target_sz and target_sz_expected are affecting the algorithm.

tuliom commented 2 years ago

I have incorporated @abalib 's comments. However, while looking at the code I found an issue (commit Fix the total amount of pages being touched) and I realized it was necessary to change the code that manipulate the DDE by limiting the amount of input/output sent to the NX instead of just limiting the amount of pages touched.

In other words: if we send too much input/output, we have to touch all of their pages, otherwise we'll have to pay the price for treating ERR_NX_AT_FAULT. If we send too little, we decrease performance. So, I decided to use the values in source_sz_expected and target_sz_expected plus an increase in the amount of output whenever that's available in order to avoid excessive ERR_NX_DATA_LENGTH.

Unfortunately this PR has now 6 commits. Commits 2, 4 and 5 are just reorganizing/improving code. I suggest reviewers to take a look one commit at a time.

I also think this is ready for review now.

abalib commented 2 years ago

So, I decided to use the values in source_sz_expected and target_sz_expected plus an increase in the amount of output whenever that's available in order to avoid excessive ERR_NX_DATA_LENGTH.

Looks right conceptually

tuliom commented 2 years ago

@mscastanho I updated the patches with the fixes that you requested.