slac207 / cs207project

MIT License
0 stars 4 forks source link

P7- Write code to find most similar light curve #58

Closed cocochrane closed 7 years ago

cocochrane commented 7 years ago
  1. Use the code you just wrote and the code above to generate (using tsmaker) a set of 1000 time series, each stored in a file. You should have one script for these.

  2. We'll use the unbalanced binary search tree from lab10 (wrapped into a database as suggested in project 6), with one key-value tree-index one for each vantage point, to make really fast similarity searches. Randomly chose 20 vantage points, and create 20 database indexes. You should have another script for this

  3. Write a command-line program which takes the name of a new data file as input, and returns the name of an existing data file whose time series is the most similar. Remember that the new timeseries's similarity against the vantage points needs to be calculated.

Thus the op you are supporting is: take an input light curve and compare it against your database. Find the top-n (say 10) similar light curves and return their ids.

Remember, for milestone 2, this database needs only to work as a library, which carries out its work in a manner to sqlite: multiple processes may access the database at the same time. Remember that you are basing this off lab10, where simultaneous reads are allowed, but simultaneous writes are not. We wont worry about atomicity or isolation here, and there is no transaction manager; but multiple instances of the command line program may be accessing this database at the same time.

cocochrane commented 7 years ago

Finished, although currently in conversation with our partner team about little tweaks they want me to make (nothing they found was incorrect, they just want files in certain places and are picky!)