hiroyuki-kasai / SGDLibrary

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
MIT License
218 stars 85 forks source link

Binary 1s and 0s #3

Open mwendamuriira opened 6 years ago

mwendamuriira commented 6 years ago

Greetings Hiroyuki,

I'm so excited to have come across such a rich library in SGD, thank you for these great project.

I'm currently learning Python, Matlab and Machine Learning for data training and prediction.

I have collected data as bitmap files (400 files each containing 1000 ~ 1s and 0s), I wanted to do data training and prediction on single file first, and later a couple of files (5 files, 10 files, 20 files... and so on). These bitmap files are in .txt (text files) format.

Kindly advice, how I can use these bitmap files as input?

Lawrence.

hiroyuki-kasai commented 6 years ago

Thank you for your interest in SGDLibrary.

By the way, I am afraid that I cannot understand what "each containing 1000 ~ 1s and 0s" means.

Regards,

Hiro

mwendamuriira commented 6 years ago

Thank you for your quick response.

They are data collected from link traces between wireless sensor nodes, 1 represents a packet was successfully sent and 0 represents lost packet. Thus, each file represents 1000 packets sent between sender nodes and receiver nodes, that is, of both received packets and lost packets.

I want to use them (the collected data) as historical data (network behavior) to predict (future network behavior).

hiroyuki-kasai commented 6 years ago

Is that a supervised learning problem? Which problem (model) do you want to use to predict the future behavior? Is that linear regression, logistic regression, or others??

mwendamuriira commented 6 years ago

I'm not sure about the problem (model)... but I guess that, my expected prediction(s) are 1s and 0s. Thus it is a classification problem under supervised learning and I believe Logistic Regression may be the solution.

My challenge is there are no more features for input, just the bitmap files.

hiroyuki-kasai commented 6 years ago

Thanks.

Then, in that case, we need label data. More concretely, we need X and y, where X is feature data for predict Y, and y is label data (e.g., 0/1 or -1/+1, or others.)

As for logistic regression, you can see below;

https://en.wikipedia.org/wiki/Logistic_regression

Does the bitmap files include them to learn a model ??

mwendamuriira commented 6 years ago

Okay, thank you.

I did understand the statement "Does the bitmap files include them to learn a model??"

I want to treat my bitmap files just as raw data for training.

hiroyuki-kasai commented 6 years ago

If you want to "train" something, you have to decide/select a (training) model, and prepare appropriate data. Then, you can convert your own datasets into the predefined data format for the model to be learned.

By the way, why do you need stochastic optimizations?

mwendamuriira commented 6 years ago

Thank you.

From a single bitmap file (containing the 1000 1s and 0s [e.g. 1101011101..]) I have been able to extract some statistical data such as the total number of 1s (=7 from that example [e.g.]), number of transmission count of 1s (= 4 [it doesn't matter if the 1s are in a group or just a single 1]), average number of 1s per a transmitted count ( 7/4=3.5 appx. round-off to the floor = 3) and likewise I exacted statistical for the 0s... These are some basic information to determine link quality between two communicating nodes, could these statistical data be appropriate for training? I'm still learning and very excited with machine learning's ecosystem.

I am searching for an optimization algorithm to train a classifier, such that, that model can adapt to changing wireless network conditions. I am interested in stochastic optimization because the model can dynamically adjust the learning speed based on the error gradient and I believe SGD can also accelerate learning when the error is large so the model can quickly adapt to the underlying wireless sensor link quality variations.

From the extracted statistical data I can easily tell the link quality [from the total number of 1s within a single bitmap file] (0% = No link; Bad Link<=10%; 10%>Intermediate (Link can be used temporally)<90%; 90>= Good Link; 100% perfect Link)...

mwendamuriira commented 6 years ago

Greetings!

I'm still struggling with how to prepare my input data for data training and how to determine an appropriately model for accurate prediction...

Could you kindly reference me to some tutorial(s) in machine learning that explains the various ways of preparing data for input and selection of the right model(s)?

hiroyuki-kasai commented 6 years ago

Thanks. I am afraid that ML has a broad filed. So, it is, in general, quite hard to point out such materials. First, you can find some web-pages, and learn them which techniques and techniques you need for your goal. Then, you can deepen them.

In my personal feeling, this library and this thread may not be appropriate for your goal. This library is for a stochastic optimization research purpose, I am afraid.

Good luck!

Hiro

mwendamuriira commented 6 years ago

Thank you for your time and advise :)

Lawrence