Convergence of SNN model

wang-zixuan commented 3 years ago

Hello, I've tried to reproduce the results of SNN (binocular) and used FusionFlowNetLike structure. However, the network didn't converge and the mean depth error was 2m. Therefore, I'd like to know the details of the training procedure of the stereo SNN model (hyperparameters, training dataset, etc) and what SNN structure did the paper use to get the MDE error down to 19.8cm in Split 1. Thanks!

urancon commented 3 years ago

Hi, thank you for sharing your issue. We've been busy lately doing other experiments, and I admit I have not left the repo in a tidy state.

FusionFlownetLike is actually an old model class, which I don't use anymore, but it is indeed similar to the "binocular model" with which we reported an MDE of 19.8 cm in the paper. Dataset splits are the same as in the paper "Learning an event sequence embedding for dense event-based deep stereo" by Tulyakov et al.. For instance, dataset split 1 uses indoor_flying2/3 for training set, and indoor_flying1 for valid and test sets.

I will upload current models and a proper training script with a list of hyperparameters within the week ! I will notify you on this thread when changes are done. Sorry for making you wait a bit more !

urancon commented 3 years ago

Hello again @wang-zixuan , and truly sorry for the delay !

Now the code should be clean and neat. I updated the README for the installation, and it should be easier from now.

I have provided in SNN_models.py a StereoSpike class, which is binocular, and has high accuracy on MVSEC's indoor_flying subset. You will notice that it only has one encoder, and that left and right data are concatenated into a single 4-channel tensor as an input to the model. This is because this architecture provided better results than the previous one (ie., with 2 encoders).

You will also find some other model classes, which are actually the ones we used for the article and are equivalent, but not as clean as the StereoSpike one.

Talking about that, we have submitted a new version of the paper on Arxiv, and it should be airing on monday 29th of November (next Monday). It features this architectural update, some interesting ablation studies on intermediate prediction layers and skip connections,, as well as more information on the influence of spike penalization on accuracy. Some results may vary a bit because we adopted a stricter methodology, but the main results stay the same. Be sure to check it there ! I also added this version to the /sources folder, so that you can already take a look at it.

The set of hyperparameters we used is the same as in train.py. Feel free to ask if you have more questions. Now that I am finished working on this new version, I will have more time and will be able to answer you faster !

wang-zixuan commented 3 years ago

Thanks for your contribution! I've already tried StereoSpike and it got good results. Great work!

I have another quite detailed question about SPLIT1_TEST_INDICES. In DDES, the validation set is generated by shuffling the dataset with random seed 0 and choosing the first 200 examples, while the examples left are treated as the test set. By reproducing DDES, the test indices I get are different from SPLIT1_TEST_INDICES. Could I know how you generate these indices? Correct me if I'm wrong.

urancon commented 3 years ago

Glad to hear that you found the model helpful for your work !

Concerning your question: because there was no explicit list of indices in DDES, we indeed used a different seed than the one they used, and hard-coded it in datasets/MVSEC/indices.py for easier reproducibility. Therefore, our validation and test sets might differ a bit from Tulyakov et al., but that being said, results should be very similar because the methodology is the same and validation indices are selected randomly with a uniform distribution.

However, I might add DDES indices to the repo, for the sake of completeness. You can still replace our indices by the ones you calculated in the meantime. Thanks for your feedback !

wang-zixuan commented 2 years ago

Hi there, sorry for bothering you again. After training several times of StereoSpike, my test result on Split 1 is always around 20cm, which is different from the result of 18.5cm in your paper.

Therefore, I'd like to just make sure if the current configuration in train.py of the network in this repo (lr, loss, and etc) is the same as that of the network in the paper. If it is like this, the difference in the result can be attributed to the device. The only difference I found between your paper and code was that in the paper, you trained the network for 30 epochs and the lr was divided by 10 at epoch 10, 25, and 40. But in train.py, the milestone is [8, 42, 60] (and this config has better results in my reproduction).

Thanks for your reply!

urancon commented 2 years ago

Hello again, and sorry for the delay !

I've also tried the code on the repo and noticed indeed this slight drop in accuracy. Thanks for reporting it ! I can confirm that the current hyper-parameters on train.py are the ones we used to get the results reported in the article.

Therefore, I suspect that this small performance gap comes from changes that I did when I cleaned up the code for the public; I've simplified a few things (especially dataloading and models). I am currently working on finding what causes this behaviour. I'll get back to you once it's done !

urancon commented 2 years ago

Hi @wang-zixuan , I've been able to trace the source of variability between the repo and the article: it comes from slight differences that I left in the model's code, and from a typo that I left when cleaning the training script.

More specifically, I now updated the repo so that models use Sigmoid() surrogate function by default, instead of ATan(). Also, I've noticed that the SEWResBlock class always used IFNode for all models, even for fromZero_feedforward_multiscale_tempo_Matt_SpikeFlowNetLike, which is the "historical" model that we used in the paper and that normally has ParametricLIFNode everywhere (except for the output). So I corrected that, and you now can either choose IF or PLIF in the bottleneck depending on your model (i.e., respectively the "simplified/cleaner" StereoSpike or the "historical/complete" fromZero_feedforward_multiscale_tempo_Matt_SpikeFlowNetLike.

Last but not least, I spotted a big typo in the training/testing loop ! The binocular model was getting the data from one camera only (the right one) instead of both ! It is fixed now.

Now to recap, what you should do is:

let the hyperparameters in train.py as they are
instantiate you model with net = fromZero_feedforward_multiscale_tempo_Matt_SpikeFlowNetLike(tau=3., v_threshold=1.0, v_reset=0.0, use_plif=True, multiply_factor=10.). WIth the last changes I made, this is now exactly what we used in the paper.

if you use the seed '2021', you should obtain a validation MDE of ~19.2 cm at epoch 61. To make sure that everything is fine from the beginning, you should get the following logs at Epoch 0:

Epoch: 0, Training Loss: 4.056456051570426, Training Mean Depth Error (m): 0.7004061341285706, Time: 244.70587253570557
Epoch: 0, Test Loss: 2.349595972299576, Test Mean Depth Error (m): 1.610573649406433, Time: 4.791503667831421

One thing that I wanted to do but haven't had the time to yet, is to add some good seeds in the welcome page. I'll do that when I can.

Finally, I will update these details accordingly in the third version of the article (perhaps not immediately though, as it could go with other updates). I believe research papers should be easily reproducible, but that is often difficult to do a perfect beta testing on one's own ! So thank you for reporting these issues ! If you don't have any other questions on this topic, please let me know so that I can close the thread !

wang-zixuan commented 2 years ago

Thanks for your reply and hard work! I'll close this issue.

urancon / StereoSpike

Convergence of SNN model #1