Closed maxeeem closed 10 months ago
Hi Maxim, I will need more info about what is the Distributor
@ccrock4t you can see the code from OpenNARS.
As the comments suggest (I will add them to this PR shortly)
A pseudo-random number generator, used in Bag.
And
For any number N < range, there is N+1 copies of it in the array, distributed as evenly as possible
This is all following our discussion over email, regarding selection probability in bag being propositional to priority.
Without this change, the result of test_bag_take_task()
looks like this (as @bowen-xu pointed out)
After the change it looks like this which I believe is more along the lines of what we want
In terms of speed, it seems like that is the cost of using this algorithm. If we can find out the name of the algorithm, maybe there are discussions online about improving it?
It seems a Distributor with value $N$ takes $O(N^2)$ time for initialization, so that is why the slowdown is occurring.
My initial thoughts: the self.order
array is being filled one value at a time (in a loop) using local variables rank
, time
, and index
. Can we simply save the state of those variables as members of the object, then calculate the value for self.order
on the fly, whenever pick()
is called, using the same logic in __init__
?
EDIT: it will probably not be that simple
It seems a Distributor with value N takes O(N2) time for initialization, so that is why the slowdown is occurring.
My initial thoughts: the
self.order
array is being filled one value at a time (in a loop) using local variablesrank
,time
, andindex
. Can we simply save the state of those variables as members of the object, then calculate the value forself.order
on the fly, wheneverpick()
is called, using the same logic in__init__
?EDIT: it will probably not be that simple
@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.
@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.
Not for the current default number of buckets (100), but the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more).
Since it is only a 1-time operation, the slowdown would not really impact user experience during NARS operation. Only on startup. However the memory also takes up $O(n^2)$.
@bowen-xu
@ccrock4t I agree that "the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more)." However, do we really need 10,000 levels? I think 100 levels is enough, as we don't need such high accuracy for priority. The control mechanism is not accurate.
do we really need 10,000 levels? I think 100 levels is enough
I agree it is OK for now, since we only tend to use 100 buckets.
But, if we want to try more a lot more buckets in the future, we should revisit this.
See #83
I added the distributor but looking for your suggestions to see if we can improve the implementation. The issue is that initializing the Distributor object is not exactly quick, there are multiple loops in the
__init__
method. To avoid recalculation I added a cache in the form of a static methodnew
but as I am myself new to python, perhaps you guys can think of better ways to optimize the calculations. With this change tests run slower but still less than 1 min.