grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
94 stars 20 forks source link

Have the indices become way bigger with 1.7.x? #166

Open reox opened 2 months ago

reox commented 2 months ago

I used 1.6.x before and today compiled 1.7.14. I re-ran the indexing and received a warning from my disk monitoring system, that the disk was almost full. With the old version, the indices were around 15GB for ~27GB of mails, now it is around 30GB for the same amount of mails. I did not change any settings. Is that to be expected?

grosjo commented 2 months ago

Will dig into that

grosjo commented 2 weeks ago

I think I found the culprit. @reox kindly test latest git

reox commented 2 weeks ago

Running it at the moment, but it looks like that doveadm index -A '*' takes much longer now, but the xapian-database size is still <2GB. I'll let it run and check in a few hours how it went.

reox commented 2 weeks ago

hmm... I had now several crashes and the indexing seems to be waaaay slower than previously. The previous run was finished in ~4h, now it took over 10h and I only collected around 3GB of index files. Something seems to be not right...

edit: looking at my monitoring, the whole time with the latest snapshot, I saw mostly IOWAIT, while with the older version I have compiled (2024-07-22) I see mostly USER CPU. Has this to do with the detach option?

grosjo commented 2 weeks ago

doveadm index -A '*'

Supposedly, it says how many threads it is using to index those emails. Can you share that ?

grosjo commented 2 weeks ago

hmm... I had now several crashes and the indexing seems to be waaaay slower than previously. The previous run was finished in ~4h, now it took over 10h and I only collected around 3GB of index files. Something seems to be not right...

edit: looking at my monitoring, the whole time with the latest snapshot, I saw mostly IOWAIT, while with the older version I have compiled (2024-07-22) I see mostly USER CPU. Has this to do with the detach option?

Weird. Have you changed your storage space ?

reox commented 2 weeks ago

Supposedly, it says how many threads it is using to index those emails. Can you share that ? between 2 and 8. However, I only have 4 physical cores and HT. Possibly too many threads were started?

Weird. Have you changed your storage space ?

no. all mails are on a separate ext4 on a LVM raid1. The only thing I did was to increase the disk size when I opened the issue.

grosjo commented 1 week ago

@reox Kindly try latest git

reox commented 1 week ago

I'm running it right now. Is there by the way a method to reduce the CPU load? I set

service indexer-worker {
    vsz_limit = 2G
    process_limit = 2
}

however, when I run doveadm index -A '*' I still get 100% cpu load on all 8 cores, which is a bit bad when the server has to do other things as well :D

edit: but something is different with the latest versions: I see high IOWAIT after some minutes of indexing:

# dstat
You did not select any stats, using -cdngy by default.
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  1   0  97   2   0|2928k  901k|   0     0 |4220B 5249B|1155  1975
  4   1  53  43   0|8100k 1194k|  33k   10k| 368k  272k|2253  3923
  3   2  63  33   0|9356k   10M| 708B 1978B| 204k 9008k|2099  3505
  3   1  64  32   0|7740k 1192k|  23k   14k| 264k    0 |2160  3765
  2   0  45  53   0|4132k 2426k|  22k   11k| 412k  112k|2284  3533
  4   2  57  38   0|9396k   17M| 918B 2658B| 672k   16M|2570  4048
  4   1  50  45   0|  10M  931k|  24k   12k| 752k    0 |2684  4412
  4   0  48  48   0|7340k 1635k|7545B   14k| 668k  356k|3208  5009
  4   1  59  35   0|  13M   10M|  32k   49k| 780k 9108k|2711  5062
  4   1  54  41   0|  12M 2548k|  21k 9465B| 720k 1376k|2274  3881
  4   1  44  51   0|  12M 7044k|1350B   26k| 652k 5700k|3027  6400
  5   2  53  41   0|  14M   19M|  24k   10k| 772k   18M|3170  4184
  3   1  46  50   0|8052k   38M|3273B   11k| 608k   84k|2270  3504
  2   1  47  51   0|5444k   36M| 298B 2230B| 436k  216k|2013  3087
  2   1  43  54   0|7124k   13M|  84k   39k| 688k 8996k|2945  3898
  3   1  37  60   0|8196k 1437k|1854B 7515B| 660k    0 |2601  6223
  1   1  42  56   0|7204k  846k| 310B 2674B|1172k   36k|2558  4125
  2   1  36  61   0|7620k   41M|3490B   13k|2048k    0 |3103  4967
  1   1  29  70   0|4296k  120M|  13k   44k| 716k  896k|2591  4419
  1   0  23  76   0|2636k   41M|  74k   32k| 336k    0 |2509  3817
  2   1  28  70   0|6076k   62M|1879B   26k| 544k 4320k|2838  6736
  2   0  30  68   0|3828k   89M|1901B 6063B| 588k    0 |2533  3838
  1   1  42  55   0|5052k   47M| 562B 1262B| 660k 4068k|2415  3804
  1   1  50  48   0|5236k  728k| 328B 1846B| 836k    0 |2434  4003
  2   1  48  48   0|7244k   11M| 105k   44k| 560k 8392k|3081  4086
  2   0  37  61   0|5432k 4178k|2709B 7995B| 456k    0 |2716  3683
  2   1  31  67   0|4632k 3255k|  20k 5278B| 528k    0 |3038  4215
  1   1  33  65   0|2980k 4385k|1537B 6739B| 556k 2088k|2669  3536
  2   1  52  46   0|6508k 1143k|  32k   46k| 760k    0 |2609  4690

If I let it run with the old version, I do not have that behaviour

editedit: But it ran completely through now, in just under 3.5h. The filesize is the same though. But as long as it works :D

grosjo commented 10 hours ago

Kindly check latest git