neolifer / LLM4POI

Apache License 2.0
23 stars 5 forks source link

train / test sample size #3

Closed fyzl233 closed 4 months ago

fyzl233 commented 4 months ago

Your preprocessing code seems to be different from STHGCN. What is the sample size for your training and testing after preprocessing?

After run I get

nyc: train set: 11022 trajectory , test set: 1447 trajectory tky: train set: 51661 trajectory , test set: 7079 trajectory ca: train set: 36374 trajectory , test set: 2864 trajectory

Is it right? Thanks for your reading~

neolifer commented 4 months ago

Hi, thank you for your interest.

The preprocessing code for getting the samples is based on the code in STHGCN.

What we changed:

  1. we fixed some bugs.
  2. we removed the part to generate the hypergraph.
  3. we added back the removed entries to the test set. The original code only keeps the last entry in a trajectory because the trajectory is already used to generate the hypergraph.
  4. we kept the first entry in every trajectory.

So, what we get should be the complete data after filtering and splitting. The sample size we have is the same as what you get.