stanford-futuredata / InQuest

Accelerating Aggregation Queries on Unstructured Streams of Data
7 stars 2 forks source link

Replicating the proxy results for video datasets #5

Closed elagaty closed 2 weeks ago

elagaty commented 2 months ago

Hi all,

Thank you for your code. I am interested in replicating the results in the scope of a lab at the TU Darmstadt. I was able to run your TASTI code to replicate the proxy values for the video datasets, but I am not understanding why you are normalizing the y_pred values of the proxy in InQuest (as mentioned in the paper). What is the normalisation resembling? Furthermore, I would like to understand which hyperparameters you used in TASTI to get the values in the csv files used in InQuest, especially the number of buckets and the parameter k (how many clusters are considered for each frame)

elagaty commented 1 month ago

Update: I understood the reason for normalisation of the proxy values. But I am still interested in the hyperparameters that were used for the Proxy.

UelisonSantos commented 4 weeks ago

Hello Mattew @mdr223

Could you help us with the hyperparameters used for the Proxy? We are running the code, but getting different results

Thank you, Uélison

mdr223 commented 4 weeks ago

Hi there, sorry for the delay in responding -- if I understand correctly, you are asking about the TASTI hyperparameters we used to generate the proxy csv files in the InQuest repository? Unfortunately, I do not have the TASTI code that I ran on my current machine.

mdr223 commented 4 weeks ago

My second question would be why you need the proxy values to align perfectly? If you are comparing two systems (e.g. InQuest and System X) then I think it would be fine to compare them using the same set of proxies (whether it's the ones you generate or the ones in this repository). But if you're not interested in system comparison (and just trying to reproduce results for the purposes of a lab assignment) then I think as long as the proxies are reasonably good it should also be fine if they differ from the ones I generated.

elagaty commented 2 weeks ago

My second question would be why you need the proxy values to align perfectly? If you are comparing two systems (e.g. InQuest and System X) then I think it would be fine to compare them using the same set of proxies (whether it's the ones you generate or the ones in this repository). But if you're not interested in system comparison (and just trying to reproduce results for the purposes of a lab assignment) then I think as long as the proxies are reasonably good it should also be fine if they differ from the ones I generated.

@mdr223 That makes sense. Our proxy values were close to the ones you got. How did you solve the issue of TASTI not being able to produce proxy values online/in real-time? If I understand correctly, TASTI starts forming the clusters (and thereby computing the proxy values) once it has seen all the data. In other words, how were you able to use TASTI in InQuest without prior learning?

mdr223 commented 2 weeks ago

Good question, I used the TASTI proxy values because -- for the purpose of experimental evaluation -- we just needed a set of reasonably good proxy values for the dataset(s) so that we could compare InQuest (and baselines) on a set of fixed values. Rather than train a new real-time object detection model for proxy value generation (and then spending time and money to run that model on our datasets), we simply elected to use TASTI's proxy values as a stand-in for what a real-time object detection model might produce.