mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
35 stars 40 forks source link

Add DarkNet models, Ensmallen Callbacks and ChannelFirst Preprocessor. #20

Closed kartikdutt18 closed 4 years ago

kartikdutt18 commented 4 years ago

Hey everyone, This is is a WIP of DarkNet model for this repo.

TO DO:

Kindly let me know what you think. Thanks a lot.

kartikdutt18 commented 4 years ago

Let me know once you are able to isolate the problem.

Sure will do, So far built the network to 20 layers and the network still matches the output. I'll get more info once I get to the layer where the model output differs.

Output of layer  19  :  tensor(8483.1250, grad_fn=<SumBackward0>) | 8483.110115
saksham189 commented 4 years ago

the network still matches the output

Can you tell me what exactly you are doing? Are you training the network ?

kartikdutt18 commented 4 years ago

Sure, I took a random tensor and saved into a csv. I build the network layer by layer to find the first layer where model output differs because the final output doesn't match. Also, I'm not training the network, just checking the output sum. Let me know if this makes sense.

saksham189 commented 4 years ago

so, you are not using the smaller model that you showed above?

kartikdutt18 commented 4 years ago

Ohh, I increasing the size of the smaller model layer by layer since it's output nearly matches. Like the output of the following model :

FFN<mlpack::ann::CrossEntropyError<>> model;
  model.Add<IdentityLayer<>>();
  model.Add<Convolution<>>(3, 32, 3, 3, 1, 1, 1, 1, 224, 224);
  model.Add<BatchNorm<>>(32, 1e-5, false);
  model.Add<LeakyReLU<>>(0.1);
  model.Add<MaxPooling<>>(2, 2, 2, 2);
  model.Add<Convolution<>>(32, 64, 3, 3, 1, 1, 1, 1, 112, 112);
  model.Add<BatchNorm<>>(64, 1e-5, false);
  model.Add<LeakyReLU<>>(0.1);
  model.Add<MaxPooling<>>(2, 2, 2, 2);
  model.Add<Convolution<>>(64, 128, 3, 3, 1, 1, 1, 1, 56, 56);
  model.Add<BatchNorm<>>(128, 1e-5, false);
  model.Add<LeakyReLU<>>(0.1);
  model.Add<Convolution<>>(128, 64, 1, 1, 1, 1, 0, 0, 56, 56);
  model.Add<BatchNorm<>>(64, 1e-5, false);
  model.Add<LeakyReLU<>>(0.1);
  model.Add<Convolution<>>(64, 128, 3, 3, 1, 1, 1, 1, 56, 56);
  model.Add<BatchNorm<>>(128, 1e-5, false);
  model.Add<LeakyReLU<>>(0.1);
  model.Add<MaxPooling<>>(2, 2, 2, 2);

matches the pytorch model when the weights are transferred in eval and train mode both. This represents less than half of the Darknet 19.

kartikdutt18 commented 4 years ago

As shown here, the output matched till layer 8 I'm increasing the size to find where the output differs. So far built the model to 24 layer (current).

saksham189 commented 4 years ago

How many layers do we have?

kartikdutt18 commented 4 years ago

About 46.

saksham189 commented 4 years ago

Can you try something like binary search on the number of layers if that is going to be faster rather than building one layer at a time? Let me know what you think. Or maybe larger increments than 1 each time so, that we can just track down the error faster.

kartikdutt18 commented 4 years ago

Yeah I have started doing increments of 4 now, instead of 1 so I can tell where the output differs tonight.

saksham189 commented 4 years ago

For example after the 10th layer the output for mlpack and PyTorch are 97,983.6707 | 56,498.9453 respectively. I'm looking into why the difference is there.

You said above that the output start's different after the 10th layer only?

Also, take a look into why you have to set the deterministic manually for the layers. Is the deterministicSetVisitor not working correctly or if there is any other issue?

kartikdutt18 commented 4 years ago

You said above that the output start's different after the 10th layer only?

This comment resolved that.

Also, take a look into why you have to set the deterministic manually for the layers. Is the deterministicSetVisitor not working correctly or if there is any other issue?

Yeah, I would need to look into that. I don't think it's working correctly.

saksham189 commented 4 years ago

Hmm... not sure why the change in the value of eps makes any difference. Are you sure that is required?

kartikdutt18 commented 4 years ago

You are right, The output is nearly the same. The output with 1e-8 is 8481.349613 with earlier being 8483.110115. I think main change came because of deterministic being set.

kartikdutt18 commented 4 years ago

So, the difference occurs between 28th and 44th layer. At 28th layer the output for PyTorch | mlpack is :

tensor(-6255.7974, grad_fn=<SumBackward0>) | -6255.796399

However at the 44th layer the output for PyTorch | mlpack is :

tensor(-1198.3817, grad_fn=<SumBackward0>) | -1621.399175
saksham189 commented 4 years ago

alright let's try to narrow down to the exact layer. (maybe using binary search or whatever is easier)

kartikdutt18 commented 4 years ago

Yep (I can use binary search) here. Will let you know once I find the layer.

kartikdutt18 commented 4 years ago

Yeah the output starts to change after the 39th layer (BatchNorm). Output at 38th layer from PyTorch and mlpack are tensor(-11858.4180, grad_fn=) and -11858.42358 respectively. Output of layer 39th layer tensor(-21867.9277, grad_fn=) and -32445.34898.

kartikdutt18 commented 4 years ago

I think I got it working now, The final output also matches.

idx : tensor(347)   prob : tensor(0.0746, grad_fn=<SelectBackward>) | idx : 347 prob : 0.07456678472  

One layer in PyTorch model used padding which isn't mentioned in the original DarkNet architecture so I added that it's working now. Tested it with two random tensors. Moving onto testing with images.

saksham189 commented 4 years ago

Alright sure. Let me know when you are able to get any results.

kartikdutt18 commented 4 years ago

Alright sure. Let me know when you are able to get any results.

Sure, The PyTorch and mlpack model produced the same results of 76% on 80 images from imagenette (smaller subset of imagenet). The prediction values and probabilities match. To get these results I made one more change, In PyTorch I used the following transform to convert to tensor,

train_transforms = transforms.ToTensor()

which scaled input images between 0 and 1. It's not division by 255, so I got the output from converted tensor and got the same accuracy. I will identify which scaling it is but I think the converter works as well as now we can convert any model from PyTorch to mlpack. Other thing that I will now look into is why we need to manually set deterministic.

saksham189 commented 4 years ago

Other thing that I will now look into is why we need to manually set deterministic.

Alright sure let me know once you figure this out.

saksham189 commented 4 years ago

I think this should be fairly easy. You can just try to add some print statement and see if deterministic is being set properly or not. Let me know what you think.

kartikdutt18 commented 4 years ago

Sure, I can try that.

KimSangYeon-DGU commented 4 years ago

This will help https://pytorch.org/docs/0.2.0/_modules/torchvision/transforms.html#ToTensor

kartikdutt18 commented 4 years ago

This will help https://pytorch.org/docs/0.2.0/_modules/torchvision/transforms.html#ToTensor

I tried that but the output of division by 255 differs from C++. I found a matching thread here but that is also unanswered.

saksham189 commented 4 years ago

Hey @kartikdutt18 were you able to find the issue with deterministic?

kartikdutt18 commented 4 years ago

Hey @kartikdutt18 were you able to find the issue with deterministic?

Hey @saksham189, I have solution that works. I have opened a PR, mlpack/mlpack#2552 for it with tests that passed locally. Also, The deterministic was not being set I checked it with print statements.

saksham189 commented 4 years ago

Alright great and the work on this PR is complete? Can you push the latest changes ?

kartikdutt18 commented 4 years ago

Sure, I'll do make a push for it. I am took a look at the ToTensor() part, I think it's related to compression in jpg images. I'll see if a different format helps. Even If I load the image and convert it to numpy matrix and divide it by 255 the output sum is different. After this, it should be ready to go.

saksham189 commented 4 years ago

In PyTorch I used the following transform to convert to tensor, train_transforms = transforms.ToTensor() which scaled input images between 0 and 1. It's not division by 255, so I got the output from converted tensor and got the same accuracy. I will identify which scaling it is but I think the converter works as well as now we can convert any model from PyTorch to mlpack.

I am not sure if I completely understand the issue. Do you use this only in pytorch or do you use the output from this in mlpack as well?

kartikdutt18 commented 4 years ago

The issue is that, In PyTorch preprocessing, I use ToTensor() to convert image to tensor. According to documentation here it says that we divide it but 255. However for some reason if I load the image in C++ and divide it by 255.0 the output matrix is a bit different as mentioned in this thread here. I verified it in PyThon as well that if load an image and convert it to numpy array and divide it by 255 the output matrix is a bit different. Let me know if this makes things a bit clearer.

If I don't use the PyTorch tensors as input in mlpack accuracy drops to about 40.5% whereas if I use the same input I can get the same accuracy of 76% on 80 image test set from imagenette.

saksham189 commented 4 years ago

Alright makes sense. Have you tried using 256 instead of 255? Also can you paste the output matrix from pytorch after toTensor and the original matrix? Maybe we can just try to find a pattern by trial and error.

kartikdutt18 commented 4 years ago

Alright makes sense. Have you tried using 256 instead of 255?

Yeah I tried that but it didn't work, I got a very close number with a search that I wrote a search that got a very close number output (in terms of sum), it was about 249.8743 but it still isn't good enough to work on all images. The problem lies with using jpg images I think because we assign a quality to it loading, not sure though. Converting the tensor back to image gave the same image so it's a reversible operation.

Maybe we can just try to find a pattern by trial and error.

Sure but the input size is 224 224 3. I just compare the sum and first 10 numbers for simplicity.

saksham189 commented 4 years ago

Is there any way we can refer the implementation for this?

kartikdutt18 commented 4 years ago

I'm not sure, The implementation says divide the array by 255 but that isn't the same. I can first try to convert numpy array into tensor s.t. values match and then repeat in C++, I think that would be easier.

saksham189 commented 4 years ago

I think we should revise our plans once. Were we using the weight converter just for testing our model? If that is the case I think we were able to successfully able to reproduce the results and can assume that the model is working as intended.

If we wanted the weight converter as a convenience to translate between pytorch and mlpack models then we could create a separate PR for it and merge the Darknet model here for now. Let me know your thoughts or if I missed anything.

kartikdutt18 commented 4 years ago

Hmm, In my mind I was working on it to translate models so that we can have downloadable weights. However your points makes much more sense. I can have a separate PR to do that and we can merge this DarkNet PR if that makes more sense. We could first repeat this for all models first and merge them if that makes more sense and then figure out the preprocessing. The model is working correctly we also have the weights for it, what we need is to figure out the preprocessing required to convert a loaded image matrix same as PyTorch tensor after ToTensor() is called.

KimSangYeon-DGU commented 4 years ago

With these references

ToTensor() converts the data type to np.uint8. Have you checked that before by any chance?

saksham189 commented 4 years ago

I can have a separate PR to do that and we can merge this DarkNet PR if that makes more sense. We could first repeat this for all models first and merge them if that makes more sense and then figure out the preprocessing. The model is working correctly we also have the weights for it, what we need is to figure out the preprocessing required to convert a loaded image matrix same as PyTorch tensor after ToTensor() is called.

yup sounds good to me. Then you can open a new PR for the weight converter. Btw have you tested both darknet 19 and 53 architectures?

kartikdutt18 commented 4 years ago

Btw have you tested both darknet 19 and 53 architectures?

Just darkness 19 yet, It won't take me long to test DarkNet 53 also. I will get started with that, I think that will also work. I will tell you the results by tomorrow and I will do the clean up required for merging this PR.

kartikdutt18 commented 4 years ago

With these references

https://discuss.pytorch.org/t/transforms-totensor-does-not-work-as-document/25495 https://www.kaggle.com/pinocookie/pytorch-dataset-and-dataloader#2.-Version-check ToTensor() converts the data type to np.uint8. Have you checked that before by any chance?

Ahh, Thanks for finding these, I will try them out. For testing I was converting them to Float Tensor maybe this will work.

saksham189 commented 4 years ago

@kartikdutt18 were you able to try out the suggestion that SangYeon has suggested? Also how is the testing for Darknet 53 going?

kartikdutt18 commented 4 years ago

@kartikdutt18 were you able to try out the suggestion that SangYeon has suggested?

I haven't tested it yet, I was working on finishing the Darknet 53 first once that is done I try that as well. I'll let share the results with you by tonight.

kartikdutt18 commented 4 years ago

ToTensor() converts the data type to np.uint8. Have you checked that before by any chance?

Yes, this works. I tested it in Python if I get the same values from numpy and ToTensor and They are nearly the same and I can get same results from both of them. I will try it in C++ then, maybe add another PreProcessor for imagenet / imagenette dataset.

kartikdutt18 commented 4 years ago

I went through the output of nearly all layers, The output matches for all layers except the last linear layer.

Output at Adaptive Pooling (mlpack | PyTorch) : 162.85 | 162.8498
and at the last Layer (mlpack | PyTorch) : 9.7774 | 6.4653
kartikdutt18 commented 4 years ago

Hey @KimSangYeon-DGU, @saksham189, I'm not sure if there is anything wrong with the implementation. The output of at the second last layer matches completely with PyTorch i.e. Sum (mlpack | PyTorch) : 162.85 | 162.8498 Some Values in Output (PyTorch):

tensor([[[0.0916]],

        [[0.0539]],

        [[0.0374]],

        [[0.0007]],

        [[0.0155]],

        [[0.3619]],

        [[0.0856]],

        [[0.0403]],

        [[0.0115]],

        [[0.0913]]]

Output Values in PyTorch

   0.0916
   0.0539
   0.0374
   0.0007
   0.0155
   0.3619
   0.0856
   0.0403
   0.0115
   0.0913

After the adaptive pooling layer there is the following layer in PyTorch and mlpack respectively,

self.fc = nn.Linear(1024, self.num_classes)
numClasses = 1000
model.Add(new Linear<>(curChannels, numClasses));

I also checked the implementation of linear layer and I think it's correct. I also tested it with a simple test here and it gave the correct value.

I checked if the weights are being loaded into the linear layer correctly and they are. The values of the weights and biases also match.

Reference Darknet53 implementation I have also updated the converter repo with sample codes here. We know that the Darknet19 model is correct and as far as I understand I think the Darknet53 implementation is also correct since the output differs at the last layer (linear layer) and before that output matches to a very high precision and there is only one way to declare the Linear Layer so it should be correct. Could you kindly let me know if you think I missed something.

saksham189 commented 4 years ago

Can you just check a few things below:

  1. The output size of the layer before is 1024
  2. Each value of the output matches, not just the sum
  3. If the pytorch implementation of the linear layer is the same as we have in mlpack (maybe they have some kind of regularisation or something that we are not including)
  4. The weight matrix of both match exactly. Seems like this is going to be 1024 x 1000 + 1000 which is quite large. Make sure they match exactly

If everything above is working as expected I am not sure why we would be getting incorrect output from just one layer.

kartikdutt18 commented 4 years ago

Sure, I can generate a csv and call a function similar to CheckMatrices on it for both the weights and the output from the previous layer.

kartikdutt18 commented 4 years ago

The output size of the layer before is 1024 Each value of the output matches, not just the sum

I verified these two points, the values are same to a precision of 1e-6. Will checks the weights as well now.