trinker / textreduce

0 stars 0 forks source link

dimensionality reduction #1

Open trinker opened 7 years ago

trinker commented 7 years ago

https://www.usna.edu/Users/cs/lmcdowel/pubs/UnderhillIRI2007Final.pdf

trinker commented 7 years ago

DTM --> Reduction (5 types for now)

A reduce_dim with type = arg corresponding to:

  1. Principal Components Analysis (Linear) - stats::prcomp as 'pca'
  2. Metric Multidimensional Scaling (Linear) - stats::cmdscale as 'mds'
  3. Isomap* (Non-Linear) - RDRToolbox::Isomap as 'iso'
  4. Lafon’s Diffusion Maps** (Non-Linear) - ? as 'ldm'
  5. Locally Linear Embedding (Non-Linear)* - RDRToolbox::LLE as 'lle'

maybe specific reduce_dim_pca, reduce_dim_mds, reduce_dim_iso, reduce_dim_ldm, reduce_dim_lle

*Note this package is bioconductor, not CRAN **Note the dimRed and diffusionMap cause a segfault while destiny package fails to install

trinker commented 7 years ago

Make it operate on strings or dtm or tdm via methods

should be scaled (likely tf-idf weighted). If text or non-tf-idf dtm/tdm then this will be done.

trinker commented 7 years ago

See also:

https://cran.r-project.org/web/packages/dimRed/dimRed.pdf http://www.cs.cmu.edu/~efros/courses/AP06/presentations/melchior_isomap_demo.pdf https://www.slideshare.net/RowanPritchett/diffusionmapsreport

trinker commented 7 years ago

This could then feed into linear and knn models for categorization as seen in https://www.usna.edu/Users/cs/lmcdowel/pubs/UnderhillIRI2007Final.pdf