martomi / chiadog

A watch dog providing a peace in mind that your Chia farm is running smoothly 24/7.
MIT License
457 stars 120 forks source link

Improve search statistics: only consider challenges with 1 or more eligible plots #225

Open skweee opened 3 years ago

skweee commented 3 years ago

Background

At the moment we calculate the average search time for all searches. This number can be misleading for people running smaller farms. Let me explain:

Search time is relevant to make sure your disk space is fast enough to return proofs in the required time frame of 30 seconds (quicker is better). However, if there are 0 eligible plots for a given challenge, most of the time no disk space access is required, as plot filters will be answered from memory.

The relevant info for disk access time comes from searches involving 1 or more eligible plots. It would therefore make sense to differentiate between searches with 0 eligible plots and searches with 1 or more eligible plots. Only the latter statistics would provide meaningful insights for search times accessing disk space.

Imagine somebody farming only a single plot. Even if they have extremely slow disk access times (let's say 30 seconds), their statistics will look moderately good. Statistically only 1 in every 512 challenges will access their disk. So if searches with 0 eligible plots (RAM searches) will be answered in 0.05 seconds and searches with 1 eligible plots are answered in 30 seconds, the average time will be (511*0.05 + 30) / 512 = 0.108 seconds. Their number of plots over 15s will also look quite low, because only 1 out of 512 searches will access the disk, so only about 0.2 percent of searches will be above 15 seconds.

For a tiny farmer (only 1 plot), the statistics will look great even though with the above numbers they never have a chance to win a block. The more plots this farmer has, the statistics will change an he will see the trouble he is in. Larger farmers (probably >1000 plots) have a high probability of finding eligible plots for all challenges. Thus their average search time will be based almost exclusively on disk access and they will see the trouble they are in. Small farmers however, will see statistics skewed by memory access.

The farmer from this example (only 1 plot) will see the following statistics at the moment:

Search:
  - average: 0.11s over 9376 searches
  - over 5s: 18 occasions (0.19%)
  - over 15s: 18 occasions (0.19%)

Suggestion

We should differentiate search statistics for 0 eligible plots (mostly RAM access) and 1 or more eligible plots (disk access). For the above example (farmer with only 1 plot), the following statistics would be helpful in diagnosing his issue:

Search:
  - 9376 searches
  - average (RAM): 0.05s
  - average (disk): 30.00s
  - over 5s: 18 occasions (100.00% of disk searches)
  - over 15s: 18 occasions (100.00% of disk searches)
martomi commented 3 years ago

Good insights!

I'd suggest to do it a bit simpler and only consider challenges with at least 1 eligible plot for the statistics. The data about 0 eligible plots doesn't really add valuable information as I see it.

skweee commented 3 years ago

Good point! That also makes the fix easier and simplifies reading the statistics. Will edit the issue title to reflect this.