kjn / lbzip2

Parallel bzip2 utility
GNU General Public License v3.0
133 stars 17 forks source link

divbwt() arguments #30

Closed tansy closed 2 years ago

tansy commented 3 years ago

Can someone explain what are the arguments to divbwt in encode.c?

//-- encode.c --

  uint8_t *block = (void *)(s->SA + s->max_block_size + GROUP_SIZE);
  (...)
  s->bwt_idx = divbwt(block, s->SA, s->u.bucket, s->nblock);

I tried to test it with sais or libsais and it didn't work according to a plan. First i read a bit about them, found en example, and prepared tenporary buffer (Atmp) of 4 blocks size then tried to call it as shown below but it didn't work.

//-- encode.c-sais --

  int* Atmp = malloc(bs100k*100000u*sizeof(int));
  (...)
  /* s->bwt_idx = divbwt(block, s->SA, s->u.bucket, s->nblock); */
  /*// sais_u8_bwt(T, T, SA, (sa_int32_t)m, 256)) //*/
  s->bwt_idx = sais_u8_bwt(block, block, s->SA, s->nblock, 256); /// somewhat corrupted
  s->bwt_idx = sais_u8_bwt(block, s->SA, s->u.bucket, s->nblock, 256); /// Segmentation fault
  s->bwt_idx = sais_u8_bwt(block, s->SA, Atmp, s->nblock, 256); /// Segmentation fault

Here, the first example (somewhat corrupted) is the only that didn't segfault but still it's corrupted (repeats short piece of encoded message; I guess it's too small buffer but why the others with Atmp fail so miserably?

-- encode.c-libsais

  int* Atmp = malloc(bs100k*100000u*sizeof(int));
  (...)
  /* s->bwt_idx = divbwt(block, s->SA, s->u.bucket, s->nblock); */
  s->bwt_idx = libsais_bwt(block, s->SA, s->u.bucket, s->nblock); ///const int idx=libsais_bwt(buf, buf, ptr, n);
  s->bwt_idx = libsais_bwt(block, s->SA, Atmp, s->nblock); ///const int idx=libsais_bwt(buf, buf, ptr, n);

Ed: I realised that there is something like discussion on github and this shall be moved there. The only thing is it's not enabled in this repo. If some maintainer could enable out and move the thread to discussion, that would be better, I think.

tansy commented 2 years ago

Solved time ago.