linear_regression_data_generator.m

indra-ipd commented 5 years ago

Hello,

@hiroyuki-kasai Thank you for creating this project with a wide variety of algorithms.

I went through the code in linear_regression_data_generator.m and was not quite clear how the data is being generated. Also, on running the code I find that all my rows have the same number. Can you explain how the dataset is generated for linear regression?

% set number of dimensions d = 50; % set number of samples n = 7000; % generate data std = 0.25 data = linear_regression_data_generator(n, d, std);

Attached below is the data(x_train) and label(y_train) generated

data.xlsx label.xlsx

Thank you!

hiroyuki-kasai commented 5 years ago

Hi,

Thank you for your interest in my code.

As you see the code of linear_regression_data_generator.m, the 'w (=w_opt)' to be solved in the regression problem is set as

w_opt = 0.5 * ones(d+1, 1);

If d = 1, which corresponds to the 2-dimensional case, y = [w_1, w_2]' * [x_1, x_2(=1)], where w_1 is the slope of the line and w_2 is the intersection to the y-axis. Therefore, this case is exactly

y = 1/2*x + 1/2.

When d = 2; we get

z = 1/2x + 1/2y + 1/2.

you can check this case as below;

close all clear clc

n = 1000; d = 2; std = 0.1; data = linear_regression_data_generator(n, d, std);

x = data.x_train(1,:); y = data.x_train(2,:); z = data.y_train;

figure % plot z = 1/2x + 1/2y + 0.5; scatter3(x, y, z); hold on

% plot the intersection point (0, 0, 0.5) plot3(0, 0, 0.5, 'ro','MarkerSize', 20, 'MarkerFaceColor', 'red'); hold off

xlabel('x') ylabel('y') zlabel('z')

That is why the all rows are the same except the last one that correspond to the intersection.

This behavior comes from how to set w_opt. You would change the way of setting the ideal value of w_opt as you like, then you get different datasets.

I hope this helps.

Best regards,

Hiro

indra-ipd commented 5 years ago

Thank you very much for the explanation.

Regards, Indra

hiroyuki-kasai / SGDLibrary

linear_regression_data_generator.m #6

xlabel('x') ylabel('y') zlabel('z')