Open xindi-dumbledore opened 6 years ago
Thank you @xindi-dumbledore .
The MinSupport means that anything we observe less than that many of times will be discarded. In your example, other than self-loops (which are ignored by BuildHON), we observe all patterns only once (244->1619, 1619->2824, etc). In other words, if you build a first-order network, all edge weights will be 1. Setting MinSupport = 2 means throwing away all edge weights <= 2. This is expected.
The order of dependency is not controlled by MinSupport. Whenever the algorithm deems a higher-order is necessary, it will try to use higher-orders, unless hitting the MaxOrder cap.
Personally I set MinSupport = 1 unless you have a very large data set and want to speed up the computation. If you want to set MinSupport > 1, what you could try is introducing some repeating patterns, like 0 3 5 3 5 3 4 3 4 3 5 3 5 1 2 3 5 4 3 6 2 3 5 4 3 6
Feel free to let me know of further questions.
Thanks a lot for the quick response @xyjprc !!!
I have one more question to ask. I saw your guys paper mentioned the trajectory prediction using random walk. Could you elaborate on how it works? Since now there are higher order nodes, given a trajectory, there maybe multiple nodes at each step.... I'm a bit confused on how it works.
Thanks again!
@xindi-dumbledore : The purpose of random walking simulation on HON is not intended to do trajectory prediction (there are other more sophisticated methods on series prediction); rather, the random walking accuracy scores serve as a way to measure information retained in the network structure, and measure how well the random walkers' trajectories can show higher-order patterns by walking on HON rather than FON.
There can be multiple higher-order nodes representing the same physical location (state): if you see A|B.C.D, it still represents node A, but only paths from D->C->B will lead to this higher-order node.
When your random walker's trajectory looks like A -> B|A -> C|B -> D|C.B -> B|D, you are essentially seeing A -> B -> C -> D -> B, just ignore the higher-order part after the "given" symbol (vertical bar).
@xyjprc : sorry I may have too many questions but I do want to understand how you guys obtain the random walking accuracy scores correctly, the process I have in my mind is: 1) Split the data into train data and test data 2) Obtain the HON on train data 3) This step I am confused: have a random walker walking on the HON and get his trajectories and match with the test data? But the trajectories matched need to have the exact same previous pattern with the given trajectory?
One more thing, I kind of get your guys purpose is not doing trajectory prediction but do you think it kind of make sense to use this as a trajectory prediction method? Or you think it's a bad idea hahahah :D
@xindi-dumbledore Questions are always welcome. You are understanding the process correctly. That is what Figure 2 in the paper illustrates (http://www.jianxu.net/en/files/HON_SA.pdf).
To do a training/testing split, you can change this line to hold out the last X steps for testing (https://github.com/xyjprc/hon/blob/master/pyHON/main.py#L42)
HON is not intended to be a trajectory prediction engine; it is a good way to distill significant higher-order movement patterns, and help roughly simulate movement patterns with the least added complexity to the network representation (that's why it's variable order).
@xindi-dumbledore Questions are always welcome. You are understanding the process correctly. That is what Figure 2 in the paper illustrates (http://www.jianxu.net/en/files/HON_SA.pdf).
To do a training/testing split, you can change this line to hold out the last X steps for testing (https://github.com/xyjprc/hon/blob/master/pyHON/main.py#L42)
HON is not intended to be a trajectory prediction engine; it is a good way to distill significant higher-order movement patterns, and help roughly simulate movement patterns with the least added complexity to the network representation (that's why it's variable order).
Hi, Jian Xu
I have done what you mentioned about chaning line 42 of main.py to have the dataset split in trainining+ test, but when I do that, nothing is written in the output file. In addition to this, I cannot get the probabilities using the non-FREQ mode, only the counts of FREQ one. Could you please helpe giving me some clue about that?
Regards,
Igor.
@ICarvalho Hi Igor,
When you increase LastStepsHoldOutForTesting, each series' length will reduce by that number when training. I wonder if you set a high value for that parameter while your each of input series is very short? For example, if each trajectory's length is 5, but you set LastStepsHoldOutForTesting=10, the algorithm essentially trains on empty inputs.
I suggest trying with some trivial inputs, such as 3 trajectories of lengths of 10, setting MaxOrder = 1 then slowly increase, and see if it works. Let me know!
Best, Jian
I'm trying to use the python code to generate some higher order network and have some prediction of the trajectory. However, it seems the generated network is missing first order nodes. For example, if the trajectory I put in is `0 244 1619 2824 2454 1296 3626 4949 9195 9195 15708 950 1481
1 23 207 349 153 20 341 93 418 2092 2092 103 4316 4098 4316 1586 4098 4098 4098 4098 4098 4098 4098 3057 4098 4098 3818 4122 4098 1586 4098 4098 3309 4122 5455 4122 4098 3309`
and if I set some minimum support larger than 1, the network is basically empty.
Of course, the real data has more than just two trajectories. But from my understanding of the algorithm, it should have the first order network no matter what, and rewire edges based on the higher order rules. However, this is not the case based on my observation. I'm wondering whether I'm understanding the algorithm incorrectly? If this should be the case, how to predict the trajectory with some nodes missing?
Thanks a lot :)