Open indra-ipd opened 5 years ago
Hi,
Thank you for your interest in my code.
As you see the code of linear_regression_data_generator.m, the 'w (=w_opt)' to be solved in the regression problem is set as
w_opt = 0.5 * ones(d+1, 1);
If d = 1, which corresponds to the 2-dimensional case, y = [w_1, w_2]' * [x_1, x_2(=1)], where w_1 is the slope of the line and w_2 is the intersection to the y-axis. Therefore, this case is exactly
y = 1/2*x + 1/2.
When d = 2; we get
z = 1/2x + 1/2y + 1/2.
you can check this case as below;
close all clear clc
n = 1000; d = 2; std = 0.1; data = linear_regression_data_generator(n, d, std);
x = data.x_train(1,:); y = data.x_train(2,:); z = data.y_train;
figure % plot z = 1/2x + 1/2y + 0.5; scatter3(x, y, z); hold on
% plot the intersection point (0, 0, 0.5) plot3(0, 0, 0.5, 'ro','MarkerSize', 20, 'MarkerFaceColor', 'red'); hold off
That is why the all rows are the same except the last one that correspond to the intersection.
This behavior comes from how to set w_opt. You would change the way of setting the ideal value of w_opt as you like, then you get different datasets.
I hope this helps.
Best regards,
Hiro
Thank you very much for the explanation.
Regards, Indra
Hello,
@hiroyuki-kasai Thank you for creating this project with a wide variety of algorithms.
I went through the code in linear_regression_data_generator.m and was not quite clear how the data is being generated. Also, on running the code I find that all my rows have the same number. Can you explain how the dataset is generated for linear regression?
% set number of dimensions
d = 50;
% set number of samples
n = 7000;
% generate data
std = 0.25
data = linear_regression_data_generator(n, d, std);
Attached below is the data(x_train) and label(y_train) generated
data.xlsx label.xlsx
Thank you!