miurahr / pyppmd

pyppmd provides classes and functions for compressing and decompressing text data, using PPM (Prediction by partial matching) compression algorithm variation H and I.2. It provide an API similar to Python's zlib/bz2/lzma modules.
https://pyppmd.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
8 stars 3 forks source link

Use threading for PPMd8 decoders #33

Closed miurahr closed 3 years ago

miurahr commented 3 years ago

Partial fix for #28

Todo:

Issues:

Signed-off-by: Hiroshi Miura miurahr@linux.com

miurahr commented 3 years ago

How can we implement it for windows?

miurahr commented 3 years ago

@cielavenir could you mind to review changes, test it, and help the progress?

cielavenir commented 3 years ago

@miurahr it is too large to review, but at least tests are working.

miurahr commented 3 years ago

The core functions to be reviewed are three in lib2/Ppmd8Tdecoder.c :

Byte TReader(const void *p) is called from PPMd library under lib/ folder. It send signal empty when buffer exhausted, and wait signal notEmpty.

static void * Ppmd8T_decode_run(void *p)

is a function that run in the thread. the core logic comes from src/ext/_ppmdmodule.c L1440-L1470 and protect critical part with mutex.

int Ppmd8T_decode() is a function that start and handle thread, and return when input buffer is empty. It also handles signals and detect state of _run function.

miurahr commented 3 years ago

Windows fatal exception: access violation errors are reproduced on Windows test, there may be a bug but it is too difficult to debug from log.

cielavenir commented 3 years ago

Actually I have an issue:

I found that PPMD_pthread_cond_wait1(&inEmpty, &mutex) could fail to receive inEmpty signal. I have not investigated deeply but it seems like that signal was sent during pthread_cond_timedwait's relocking (here relocking would be done because of pthread_cond_wait(&notEmpty, &mutex)).

This happens because pthread_cond_timedwait is not FIFO, so I wrote simple implementation in https://github.com/cielavenir/pyppmd-py2/commit/f224a04102da9e596ae7df3212ed37bd6c2b9a9b (part of UseEndmarkProperly2).

cielavenir commented 3 years ago

Another question: did you test FFI version? The test does not run here.

Inserting https://github.com/cielavenir/pyppmd-py2/commit/43791a5169d0858c6911ef623b1ff803ac5ca226 worked.

miurahr commented 3 years ago

I found that PPMD_pthread_cond_wait1(&inEmpty, &mutex) could fail to receive inEmpty signal. I have not investigated deeply but it seems like that signal was sent during pthread_cond_timedwait's relocking (here relocking would be done because of pthread_cond_wait(&notEmpty, &mutex)).

This happens because pthread_cond_timedwait is not FIFO, so I wrote simple implementation in cielavenir@f224a04 (part of UseEndmarkProperly2).

Thank you for the suggestion! I don't prefer global variable other than mutex lock one, so I'd like to rewrite based on your suggestion.

cielavenir commented 3 years ago

@miurahr see https://github.com/miurahr/pyppmd/commit/7e707cc5fbae0a224a25d740a327544ca7e5d4a4 and https://github.com/miurahr/pyppmd/commit/92d52dfa492d6118e4f3cff2d5b600699f1fea58

squashing these two commits will become this (collapsed) ``` diff --git a/lib2/Ppmd8Tdecoder.c b/lib2/Ppmd8Tdecoder.c index c3c257b..27717f3 100644 --- a/lib2/Ppmd8Tdecoder.c +++ b/lib2/Ppmd8Tdecoder.c @@ -5,12 +5,12 @@ #include "Ppmd8Tdecoder.h" PPMD_pthread_mutex_t mutex; -PPMD_pthread_cond_t finished, inEmpty, notEmpty; +PPMD_pthread_cond_t addInput, notEmpty; Byte TReader(const void *p) { BufferReader *bufferReader = (BufferReader *)p; if (bufferReader->inBuffer->pos == bufferReader->inBuffer->size) { - PPMD_pthread_cond_signal(&inEmpty); + PPMD_pthread_cond_signal(&addInput); PPMD_pthread_cond_wait(¬Empty, &mutex); } return *((const Byte *)bufferReader->inBuffer->src + bufferReader->inBuffer->pos++); @@ -18,8 +18,7 @@ Byte TReader(const void *p) { Bool Ppmd8T_decode_init() { PPMD_pthread_mutex_init(&mutex, NULL); - PPMD_pthread_cond_init(&finished, NULL); - PPMD_pthread_cond_init(&inEmpty, NULL); + PPMD_pthread_cond_init(&addInput, NULL); PPMD_pthread_cond_init(¬Empty, NULL); return True; } @@ -69,6 +68,7 @@ Ppmd8T_decode_run(void *p) { PPMD_pthread_mutex_lock(&mutex); args->result = result; args->finished = True; + PPMD_pthread_cond_signal(&addInput); PPMD_pthread_mutex_unlock(&mutex); return NULL; } @@ -101,6 +101,7 @@ int Ppmd8T_decode(CPpmd8 *cPpmd8, OutBuffer *out, int max_length, ppmd8_args *ar return -2; // error } } +#if 0 while(True) { PPMD_pthread_mutex_lock(&mutex); if (PPMD_pthread_cond_wait1(&inEmpty, &mutex) == 0) { @@ -114,6 +115,13 @@ int Ppmd8T_decode(CPpmd8 *cPpmd8, OutBuffer *out, int max_length, ppmd8_args *ar } PPMD_pthread_mutex_unlock(&mutex); } - PPMD_pthread_join(args->handle, NULL); - return args->result; +#endif + PPMD_pthread_mutex_lock(&mutex); + PPMD_pthread_cond_wait(&addInput, &mutex); + PPMD_pthread_mutex_unlock(&mutex); + if (args->finished) { + PPMD_pthread_join(args->handle, NULL); + return args->result; + } + return 0; } ```
miurahr commented 3 years ago

I'd like to handle pypy3 case, infinite loop to wait thread finish, on another issue tracker.

cielavenir commented 3 years ago

maybe please merge this as it is and I'll revisit my UseEndmarkProperly branch later