Gradient Descent parallelization

ArtemioA commented 2 years ago

I want share some videos of Andrew Ng courses that can help the team to use 4 CPU to calculate the gradient of the cost function and optimize the neural network (with or without Adam Algorithm).

Adam Optimization Algorithm — [ Andrew Ng ] https://www.youtube.com/watch?v=JXQT_vxqwIs

Map Reduce And Data Parallelism — [ Andrew Ng ] https://www.youtube.com/watch?v=TCA2VuHTHcM&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=107

If i am not wrong, you can calculate the gradient of the cost function for a set of theta matrix simply pulling away the "cost function" in 4 parts (because is a summation, and put log function (increasing monotone) to do it if not (MLE issues on mass probability distribution asociated to discrete random variables)). Next add the gradient of that 4 functiones evaluated in the same theta matrix for calculate the maximum descent vector.

If your algorithm have the data divided in n parts (mini batch gradient), the for each epoch, you need to divide the data in 4 parts again. Apply the backprop to calculate the gradient vector evaluated in the same theta matrix and add these vectors for calculate the gradient (this is a theorem).

pescap commented 2 years ago

These are very interesting videos. You could add the links to docs somewhere, with some short relevantes notes.

ArtemioA commented 2 years ago

Professor, no problem.

pescap commented 2 years ago

@ArtemioA I am awaiting for your commit.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 years ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

pescap / EasyHPC

Gradient Descent parallelization #66