parejkoj / LSST-Zoo

5 stars 0 forks source link

Write-up of current progress. #6

Closed dougbrn closed 7 years ago

dougbrn commented 7 years ago

Important steps, project report.

dougbrn commented 7 years ago

Rough outline in chronological order:

  1. Built Basic Zooniverse Project at: https://www.zooniverse.org/projects/dougbrn/lsst-zoo-test
  2. Repurposed Asteroid Tracklet code towards generation of template-science-difference images
    • Used "Laurie Allen Data", found on ObsDecam in the LSST supercomputing cluster
    • Took apart co-add generation and added in template-science-difference structure
    • Toiled over correct scaling method, decided on including both an Arcsinh and Z-Scale representation of each cutout.
    • Cut data on existing flags to remove obvious bogus detections
    • Made "Mapper.csv" which maps cutout metadata to physical objects in the raw data.
  3. First trial run was a difference imaging classifier on 1001 cutouts, ~5 people from the LSST DM group contributed classifications
    • Retrieved ~1600 classifications on 1001 objects
    • Did some basic statistic which revealed our categories are not independent of one another
  4. Began Machine Learning application to trial run.
    • Built a functional Random Forest Classifier with imputer support.
    • 80:20 Training Set/Testing Set split on trial run returned a 75%-80% agreement rate with human input for a first pass of ML. This is encouraging given the issue with contentious categories at the Zooniverse level.
    • Retrieve feature importances for the ~90 continuous features measured on the LSST stack

Future Work:

  1. Narrow down feature set to a representative sample with little to no colinearity
  2. Work with Zooniverse to develop API to allow flexible development of the project through API alone.
  3. Test other ML algorithms
  4. Develop autonomous ML loop, capable of running directly off Zooniverse classifications.
dougbrn commented 7 years ago

ProgressUpdate.pdf now available in repository.