mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
34 stars 41 forks source link

Addition of resnet module. #63

Closed Aakash-kaushik closed 3 years ago

Aakash-kaushik commented 3 years ago

Extension of #61 and a PR for review of the Additon of resnet module.

Aakash-kaushik commented 3 years ago

Um so our pre trained models won't deserialize until we shift Sequential from MoreTypes to LayerTypes in mlpack, so let me know what should we do in this regard ?

Aakash-kaushik commented 3 years ago

Yes, the idea is to create a small dataset (refer voc dataset on mlpack.org), and download weights then load them into the model and predictions should match with pytorch.

Hey, i was going to write tests but isin't voc for sgemenation and detection ? Maybe there could be a smaller version of image net we could use, I also see one in the converter repo but not sure how to load that up in torch and what class they relate to

kartikdutt18 commented 3 years ago

Are you also adding tests for Darknet and YOLO?

Aakash-kaushik commented 3 years ago

Are you also adding tests for Darknet and YOLO?

No i just adapated them to the new format, I added a convenience function and so edited them the same way, less repeated code this way.

kartikdutt18 commented 3 years ago

Could you share the link to change

Aakash-kaushik commented 3 years ago

Could you share the link to change

Just tagged the changes in the review.

Aakash-kaushik commented 3 years ago

Also @kartikdutt18 will it be fine if rather than using a dataset we can just have a matrix of ones passed to pytroch and our models and then compare the sum of the linear layer ? seems like a reliable test to me.

kartikdutt18 commented 3 years ago

Sounds good. Just make sure there are tests for multiple and single batch size.

Aakash-kaushik commented 3 years ago

Sounds good.

Awesome shall add them in a bit I believe you can start your final review after that. :)

Aakash-kaushik commented 3 years ago

The passing tests should fail, I wrote size_t in place of double so it was comparing a float to 0 and still shows the tests as passed in preTrainedmodel tests, either that is because i wrapped them in a function or they are run by ctests but i disregard the first point because running them directly throught the catch executable results them being failed even when they are wrapped in the function but not when in ctest.

kartikdutt18 commented 3 years ago

I didn't really get this part. If possible, could you please elaborate it a bit more.

Aakash-kaushik commented 3 years ago

I didn't really get this part. If possible, could you please elaborate it a bit more.

Yeah sorry, i realise that it was a bit vague.

So before the last push i used size_t for the function that compares double values so just because of that it was comparing some number like 0.0032323 to 0 and this was passing when being runned with ctest rather than directly executing the catch2 executable which is models_test. But this issue was highlighted when i directly ran them through the executable as i said above but when i looked at the PR build info the tests were passing and specifically this one, which leads me to believe that this way we can miss tests that fail but are still shown to pass.

@kartikdutt18 not sure but does this makes sense ?

Aakash-kaushik commented 3 years ago

Also @zoq and @kartikdutt18 I believe the PR is ready for a review now.

Aakash-kaushik commented 3 years ago

@zoq apart from the above commented three things i have applied all the suggestions.

Aakash-kaushik commented 3 years ago

So i believe I don't have any major work for resnet now, and i can move to mobilenet v1 and how i am going to implement that ?

Aakash-kaushik commented 3 years ago

Left minor comments, will do complete review tomorrow (evening-ish).

Sure, i will make the changes together.

Aakash-kaushik commented 3 years ago

The preTrained test are accounted as passed but in reality they are failing, you can see the runtime they have, and just above them an actual resnet that runs and takes a huge time a pretrained version should take the same but it doesn't but ctest still reports them to be passed.

I have pasted the test results below:

 Start  4: ResNetModelTest
 4/21 Test  #4: ResNetModelTest ..........................   Passed   81.79 sec
      Start  5: ResNet101ModelTest
 5/21 Test  #5: ResNet101ModelTest .......................   Passed   79.93 sec
      Start  6: ResNet152ModelTest
 6/21 Test  #6: ResNet152ModelTest .......................   Passed  149.89 sec
      Start  7: PreTrainedResNetModelTest
 7/21 Test  #7: PreTrainedResNetModelTest ................   Passed    0.98 sec
      Start  8: PreTrainedResNet101ModelTest
 8/21 Test  #8: PreTrainedResNet101ModelTest .............   Passed    0.57 sec
      Start  9: PreTrainedResNetModel152Test
 9/21 Test  #9: PreTrainedResNetModel152Test .............   Passed    0.58 sec

And a local test run to show what's wrong:

UpdateCTestConfiguration  from :/home/aakash/models/build/tests/DartConfiguration.tcl
Parse Config file:/home/aakash/models/build/tests/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/aakash/models/build/tests/DartConfiguration.tcl
Parse Config file:/home/aakash/models/build/tests/DartConfiguration.tcl
Test project /home/aakash/models/build/tests
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 8
    Start 8: PreTrainedResNet101ModelTest

8: Test command: /home/aakash/models/build/bin/models_test "PreTrainedResNet101ModelTest"
8: Test timeout computed to be: 1500
8: Filters: PreTrainedResNet101ModelTest
8: 
8: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8: models_test is a Catch v2.13.3 host application.
8: Run with -? for options
8: 
8: -------------------------------------------------------------------------------
8: PreTrainedResNet101ModelTest
8: -------------------------------------------------------------------------------
8: /home/aakash/models/tests/ffn_model_tests.cpp:175
8: ...............................................................................
8: 
8: /home/aakash/models/tests/ffn_model_tests.cpp:175: FAILED:
8: due to unexpected exception with message:
8:   Invalid 'which' selector whendeserializing boost::variant
8: 
8: ===============================================================================
8: test cases: 1 | 1 failed
8: assertions: 1 | 1 failed
8: 
1/1 Test #8: PreTrainedResNet101ModelTest .....   Passed    3.09 sec

The following tests passed:
    PreTrainedResNet101ModelTest

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   3.10 sec 

The reason for them failing is because mlpack is installed as a step in the build process and that doesn't have sequential type in LayerType so deserialization fails.

Right now we have two issues:

  1. How to correctly identify failing tests - I found this which could help but i haven't been through it properly. Also do you think this could be because out Catch2 main function or basically the file main.cpp in tests dir always returns 0 - check here for reference this returns the number of failed tests.
  2. what to do when a user tries to use the pre trained models, they are going to fails because of the above mentioned reason.
kartikdutt18 commented 3 years ago

Then lets open a PR in mlpack to move the required layer to LayerTypes. Should resolve the issue.

Aakash-kaushik commented 3 years ago

Then lets open a PR in mlpack to move the required layer to LayerTypes. Should resolve the issue.

Sure, would create one. Also about the requested changes i see i have resisted a bit of them but that's just because keeping them this way allows the code to be more simpler and have less nmber just lying around the whole code that won't makes sense.

Aakash-kaushik commented 3 years ago

Then lets open a PR in mlpack to move the required layer to LayerTypes. Should resolve the issue.

@kartikdutt18, @zoq which exact layers should i replace ? i mean i know we need to put sequential layers into LayerType but which two layers should go to MoreType. So it makes sense in terms of complete mlpack and not just this PR.

zoq commented 3 years ago

You could move FlexibleReLU and SpatialDropout.

Aakash-kaushik commented 3 years ago

You could move FlexibleReLU and SpatialDropout.

Thanks, Can you also help me identify what causes these tests to just pass even when they are failing ?

Aakash-kaushik commented 3 years ago

@kartikdutt18 I have pushed most of the suggestions and left some comments. can you take a look ?

zoq commented 3 years ago

Looks like I missed to click the submit review button.

Aakash-kaushik commented 3 years ago

Looks like I missed to click the submit review button.

No worries, btw i pushed all the requested changes.

kartikdutt18 commented 3 years ago

@kartikdutt18 I have pushed most of the suggestions and left some comments. can you take a look ?

Sure, I will go through it in a bit.

Aakash-kaushik commented 3 years ago

I believe this wraps up the changes from my side, A final review would be awesome.

Aakash-kaushik commented 3 years ago

Kinda excited to see this getting merged. :smile: Thanks @kartikdutt18 and @zoq for bearning with me on this one. We have one more to go :p