tyler-tomita / RandomerForest

Discriminant Projection Forest results, datasets, etc.
44 stars 21 forks source link

v2 of Fig 1 #21

Closed jovo closed 9 years ago

jovo commented 9 years ago

Fig 1 -- add Bayes error for for all settings -- only plot Lhat vs. D along a single row -- add scatter plots

look at https://github.com/jovo/LOL/blob/master/FigScripts/Fig2_classification_accuracy.m

any questions about my code ask on github issues and i'll comment and fix

tyler-tomita commented 9 years ago

I ran into a problem when estimating Bayes error for parity and multimodal. All of the likelihood values for a mv normal with large d (> ~50) are smaller than 2.2251e-308, which is the smallest double precision value in matlab. This makes it intractable to find the class with the largest likelihood. For parity, the covariance matrices are identical, which allows for an easy workaround: find the gaussian centered closest to a data point, then classify this data point as whichever class this gaussian is a member of. The estimated Bayes error using this method drops with increasing d. Unfortunately, this method doesn't work for multimodal, since there is a distribution on the covariance matrices.

jovo commented 9 years ago

are you in log space? we always do stuff like this in log space to avoid numerical errors. ie, compute log likelihoods.

On Wednesday, February 25, 2015, ttomita notifications@github.com wrote:

I ran into a problem when estimating Bayes error for parity and multimodal. All of the likelihood values for a mv normal with large d (> ~50) are smaller than 2.2251e-308, which is the smallest double precision value in matlab. This makes it intractable to find the class with the largest likelihood. For parity, the covariance matrices are identical, which allows for an easy workaround: find the gaussian centered closest to a data point, then classify this data point as whichever class this gaussian is a member of. The estimated Bayes error using this method drops with increasing d. Unfortunately, this method doesn't work for multimodal, since there is a distribution on the covariance matrices.

— Reply to this email directly or view it on GitHub https://github.com/ttomita/DPForest/issues/21#issuecomment-75978732.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

tyler-tomita commented 9 years ago

Now it works :)

jovo commented 9 years ago

ok

On Wednesday, February 25, 2015, ttomita notifications@github.com wrote:

Now it works :)

— Reply to this email directly or view it on GitHub https://github.com/ttomita/DPForest/issues/21#issuecomment-76021928.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

tyler-tomita commented 9 years ago

I tried running Fig1_cigars.m. After running for over 15 min, I got an error message saying "Undefined function 'LDA_train_and_predict' for input arguments of type double." Then Matlab became unresponsive and the error sound kept repeating over and over until I force quit.

A similar thing happened when I tried to run Fig2_classification_accuracc, except the error message said "Unable to read file '/Users/Tyler/Data/Results/example_sims'. No such file or directory."

jovo commented 9 years ago

can u post 2 issues in my issues place for this? fig1 should take about 5 seconds.

On Thursday, February 26, 2015, ttomita notifications@github.com wrote:

I tried running Fig1_cigars.m. After running for over 15 min, I got an error message saying "Undefined function 'LDA_train_and_predict' for input arguments of type double." Then Matlab became unresponsive and the error sound kept repeating over and over until I force quit.

A similar thing happened when I tried to run Fig2_classification_accuracc, except the error message said "Unable to read file '/Users/Tyler/Data/Results/example_sims'. No such file or directory."

— Reply to this email directly or view it on GitHub https://github.com/ttomita/DPForest/issues/21#issuecomment-76128971.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 9 years ago

i think i fixed the issue for Fig1. i will work on Fig2 today. note that Fig0 requires some data, i haven't dealt with that yet, but if you try running all the figs, you should get some problems. note that each fig has the option of running the experiment or loading the results, so you might want to try both. obviously, i can do this too. but really, the only thing you need is LOL_loocv which should just work?

On Thu, Feb 26, 2015 at 1:42 AM, joshua vogelstein jovo@cis.jhu.edu wrote:

can u post 2 issues in my issues place for this? fig1 should take about 5 seconds.

On Thursday, February 26, 2015, ttomita notifications@github.com wrote:

I tried running Fig1_cigars.m. After running for over 15 min, I got an error message saying "Undefined function 'LDA_train_and_predict' for input arguments of type double." Then Matlab became unresponsive and the error sound kept repeating over and over until I force quit.

A similar thing happened when I tried to run Fig2_classification_accuracc, except the error message said "Unable to read file '/Users/Tyler/Data/Results/example_sims'. No such file or directory."

— Reply to this email directly or view it on GitHub https://github.com/ttomita/DPForest/issues/21#issuecomment-76128971.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

tyler-tomita commented 9 years ago

It should just work, but there is one minor bug. The third argument parms is causing problems. Judging by the "if nargs ==2" statement, it looks like it's supposed to be optional? If so, default values for parms.types and parms.ks aren't specified. I saw an explanation of them in LOL.m, but still a little unclear. What would be good choices for Qing's data?

jovo commented 9 years ago

parms.types={'DENL';'NENL'}; parms.ks=1:10;

but please post these Q's in my the issues tab of the LOL repo...

On Thu, Feb 26, 2015 at 9:45 AM, ttomita notifications@github.com wrote:

It should just work, but there is one minor bug. The third argument parms is causing problems. Judging by the "if nargs ==2" statement, it looks like it's supposed to be optional? If so, default values for parms.types and parms.ks aren't specified. I saw an explanation of them in LOL.m, but still a little unclear. What would be good choices for Qing's data?

— Reply to this email directly or view it on GitHub https://github.com/ttomita/DPForest/issues/21#issuecomment-76189590.

the glass is all full: half water, half air. openconnecto.me, we're hiring! https://docs.google.com/document/d/14SApYAzxF0Ddqg2ZCEwjmz3ht2TDhDmxyZI2ZP82_0U/edit?usp=sharing , jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York