Use the code you just wrote and the code above to generate (using tsmaker) a set of 1000 time series, each stored in a file. You should have one script for these.
We'll use the unbalanced binary search tree from lab10 (wrapped into a database as suggested in project 6), with one key-value tree-index one for each vantage point, to make really fast similarity searches. Randomly chose 20 vantage points, and create 20 database indexes. You should have another script for this
Write a command-line program which takes the name of a new data file as input, and returns the name of an existing data file whose time series is the most similar. Remember that the new timeseries's similarity against the vantage points needs to be calculated.
Thus the op you are supporting is: take an input light curve and compare it against your database. Find the top-n (say 10) similar light curves and return their ids.
Remember, for milestone 2, this database needs only to work as a library, which carries out its work in a manner to sqlite: multiple processes may access the database at the same time. Remember that you are basing this off lab10, where simultaneous reads are allowed, but simultaneous writes are not. We wont worry about atomicity or isolation here, and there is no transaction manager; but multiple instances of the command line program may be accessing this database at the same time.
Finished, although currently in conversation with our partner team about little tweaks they want me to make (nothing they found was incorrect, they just want files in certain places and are picky!)
Use the code you just wrote and the code above to generate (using tsmaker) a set of 1000 time series, each stored in a file. You should have one script for these.
We'll use the unbalanced binary search tree from lab10 (wrapped into a database as suggested in project 6), with one key-value tree-index one for each vantage point, to make really fast similarity searches. Randomly chose 20 vantage points, and create 20 database indexes. You should have another script for this
Write a command-line program which takes the name of a new data file as input, and returns the name of an existing data file whose time series is the most similar. Remember that the new timeseries's similarity against the vantage points needs to be calculated.
Thus the op you are supporting is: take an input light curve and compare it against your database. Find the top-n (say 10) similar light curves and return their ids.
Remember, for milestone 2, this database needs only to work as a library, which carries out its work in a manner to sqlite: multiple processes may access the database at the same time. Remember that you are basing this off lab10, where simultaneous reads are allowed, but simultaneous writes are not. We wont worry about atomicity or isolation here, and there is no transaction manager; but multiple instances of the command line program may be accessing this database at the same time.