Closed breakanalysis closed 1 month ago
Thanks for your questions.
We built the benchmark such that the seed_times in the test table are always the test_timestamp (rel-f1 is the only exception because there would be very few test samples otherwise). So, your hypothetical scenario where pruning the graph at test_timestamp hurts test performance does not typically arise. One way to think of this is that we train the model with info upto test_timestamp, and test it to make future predictions just beyond test_timestamp. Since the predictions can be made for many entities, there are still many samples to test against.
Yes, under our current implementation this pruning is not necessary to prevent leakage. But the benchmark is more general and intended to be used by people to develop their own methods. We make the conscious choice to avoid showing the test rows to the model to prevent any bugs. Under the current setup, bugs would lead to worse test set performance and be detectable. Without pruning, temporal leakage bugs would lead to better test performance and would hinder fair comparison on the leaderboard for example.
Hope this clarifies your doubts.
Thank you Rishabh!
Your answer does clarify all of my questions and it is understandable that you want the public leaderbord be protected from data snooping :)
Hello!
Thank you for building this cool tool :)
In
tutorials/custom_dataset.ipynb
it is mentioned thatMy understanding is that typically seed_times for rows in the
test_table
will be greater thandataset.test_timestamp
. (That belief comes from 1) it seems restrictive to assume that all seed times in the test set are identical and 2) it seems risky to have test examples occuring before the global test_timestamp. Thus it follows some seed times will exceed the test_timestamp.).If this understanding is correct, masking the entire entity graph upto test_timestamp makes some test examples very difficult, or impossible to predict. For example, if the test_timestamp is 2010-01-01 and a test example occurs 2020 with less than 10 years of historical data attached to it, pruning the graph with a single time will make this test case have an empty history.
Moreover, I dont see why pruning the entity graph globally is necessary to prevent test leakage; Doesnt the torch geometric's NeighborLoader prune away future data in the computation graph associated with a test example, thus preventing time leakage entirely? It's docs mention the follwing:
TLDR; Can pruning the entire graph at test_timestamp hurt test performance, and is it really necessary to prevent leakage?
Best regards, Jacob