Closed ArtemioA closed 2 years ago
These are very interesting videos. You could add the links to docs
somewhere, with some short relevantes notes.
Professor, no problem.
@ArtemioA I am awaiting for your commit.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
I want share some videos of Andrew Ng courses that can help the team to use 4 CPU to calculate the gradient of the cost function and optimize the neural network (with or without Adam Algorithm).
Adam Optimization Algorithm — [ Andrew Ng ] https://www.youtube.com/watch?v=JXQT_vxqwIs
Map Reduce And Data Parallelism — [ Andrew Ng ] https://www.youtube.com/watch?v=TCA2VuHTHcM&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=107
If i am not wrong, you can calculate the gradient of the cost function for a set of theta matrix simply pulling away the "cost function" in 4 parts (because is a summation, and put log function (increasing monotone) to do it if not (MLE issues on mass probability distribution asociated to discrete random variables)). Next add the gradient of that 4 functiones evaluated in the same theta matrix for calculate the maximum descent vector.
If your algorithm have the data divided in n parts (mini batch gradient), the for each epoch, you need to divide the data in 4 parts again. Apply the backprop to calculate the gradient vector evaluated in the same theta matrix and add these vectors for calculate the gradient (this is a theorem).