Calculating priors threading or performance

chucklin85 commented 6 years ago

Hi,

I've just worked through a process to "learn" a building of about 600 locations. The building is quite large and we used a small team of people to go and learn each location.

We are at the stage where calculating priors is required and this process is taking quite a long time (still going after an hour).

The machine has been configured as a 64-bit, 8 processor Xeon Server with 16GB RAM. We ran into sizing issues on a 32-bit machine and moved the database to a different host and resetup find and had another try on calculating.

I've read a few of the previous issues and see that this has been used in a university environment with a number of locations with hundreds of APs, so I am thinking it should be able to scale to a facility of this size.

Any ideas how we can see the status or how long we should expect the calculation process to take?

Thank you for a great product - this is really awesome stuff.

chucklin85 commented 6 years ago

So I did a bit of digging and it seems that in the calculatePriors function, the slowest process within it is "Calculate the nP".

I've done some work to optimise it by spawning some workers to speed it up. This has made a significant improvement. However, what is strange now is that the location list doesn't update (so the new locations the have been learnt are not added into the /locations list).

Secondly, they aren't picked up in the tracking side of things... It is quite unusual.

Potentially, I have no idea what I am doing in golang (very possible!) and I am losing the data between each worker process being undertaken.

chucklin85 commented 6 years ago

Bit more testing, it is pretty clear that with a lot of locations the Naive-Bayes algorithm, specifically for the calculation phase is significantly slower and is potentially worse than linear in computational complexity. I haven't got hard evidence to this.

Further to this, the amount of memory required for 18,000 fingerprints over 502 locations requires somewhere in the vicinity of 10GB of memory to process, followed by around 500MB of disk space. A fair bit of delay is involved with the flate compression of a lot of the data and fingerprints.

Long delays are also involved during startup reading the data into structures for memory from what I can tell, with consumption around the 15GB mark.

I ended up somewhat disabling the entire Naive Bayes calculation to work around the challenges above and went with Random Forest. Random Forest takes about 15 minutes to calculate. SVM takes approximately 45 minutes to run.

I found SVM to be fairly inaccurate so have stuck with RF. I've modified trackFingerprint to use RF as a location which seems to meet the needs currently.

I am attempting to work through what could be done to make the naive-bayes more efficient and possibly the storage more efficient. It is pretty clear that the RF method seems to be fairly efficient with tracking of fingerprints only taking 1 to 2 seconds.

schollz / find

Calculating priors threading or performance #206