mccorby / PhotoLabellerServer

Federated Learning: Parameter Server doing aggregation of updates to a model coming from clients participating in a Federated Learning setup. See also the Android application companion at https://github.com/mccorby/PhotoLabeller
MIT License
50 stars 13 forks source link

Model Training #2

Closed SawsanAbdulRahman closed 6 years ago

SawsanAbdulRahman commented 6 years ago

Hello,

When running the Main.kt, the zipped file "cifar_federated" is being generated containing the following:

However i am getting the following error:

Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:67) Caused by: java.lang.IllegalArgumentException: bound must be positive at java.util.Random.nextInt(Random.java:388) at org.nd4j.linalg.util.ArrayUtil.buildInterleavedVector(ArrayUtil.java:1679) at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.shuffle(CpuNDArrayFactory.java:814) at org.nd4j.linalg.factory.Nd4j.shuffle(Nd4j.java:452) at org.nd4j.linalg.dataset.DataSet.shuffle(DataSet.java:619) at org.datavec.image.loader.CifarLoader.convertDataSet(CifarLoader.java:380) at org.datavec.image.loader.CifarLoader.next(CifarLoader.java:424) at org.datavec.image.loader.CifarLoader.next(CifarLoader.java:392) at org.deeplearning4j.datasets.iterator.impl.CifarDataSetIterator.next(CifarDataSetIterator.java:110) at com.mccorby.photolabeller.ml.trainer.CifarTrainer.eval(CifarTrainer.kt:97) at com.mccorby.photolabeller.ml.MainKt.main(Main.kt:39) ... 5 more

Would you advise in this regard please

mccorby commented 6 years ago

Hi, This happened to me a few times. It looks as if the shuffle couldn't compute the range properly probably because some of the files are not correct. I used to run the trainer again and it worked