onnx / models

A collection of pre-trained, state-of-the-art models in the ONNX format
http://onnx.ai/models/
Apache License 2.0
7.46k stars 1.36k forks source link

arcface model is invalid #91

Closed snnn closed 5 years ago

snnn commented 5 years ago

I downloaded the model from: https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx

If you open the model, take a look at the second OP: Sub. Its first input, A, is a float tensor, but its second input, B, is a double tensor.

ankkhedia commented 5 years ago

Hi @snnn, I tried viewing the above model using Netron and for me the second input B for Sub operator shows up as float tensor in Netron.

screen shot 2018-08-30 at 2 21 22 pm
snnn commented 5 years ago

Hi @ankkhedia

It's float64 ?

prasanthpul commented 5 years ago

@snnn is the problem that the type needs to be the same for both?

snnn commented 5 years ago

Yes.

prasanthpul commented 5 years ago

@ankkhedia can you fix the model?

ankkhedia commented 5 years ago

@prasanthpul I will take a look.

prasanthpul commented 5 years ago

@ankkhedia any update on this?

ankkhedia commented 5 years ago

Hi @prasanthpul Sorry for being late as got pulled into some other things. I will try to prioritise it this week.

ankkhedia commented 5 years ago

@prasanthpul @snnn It seems to be error in MXNet-ONNX converter. I have raised an issue with the team https://github.com/apache/incubator-mxnet/issues/13044 I will convert and put back new model here when the issue gets fixed.

linkerzhang commented 5 years ago

This is not good. We'd remove these models if they're invalid. We can add them back after fixing those issues.

@snnn are there more model issues you saw please? Thank you very much for bringing this up!

linkerzhang commented 5 years ago

@ankkhedia

snnn commented 5 years ago

In addition to Arcface, there are also problems in:

snnn commented 5 years ago

@ankkhedia Any update? Could you please confirm if these models have problems?

Thanks

ankkhedia commented 5 years ago

I will check other models. However, Arcface issue has been fixed and I will update the new model.

ankkhedia commented 5 years ago

Hi @snnn Could you please point to the problems with the above models you listed so that I can take a look.

snnn commented 5 years ago

The inputs to GEMM operator, are not 2D tensors. They have more than 2 dimensions.

ankkhedia commented 5 years ago

@snnn This has been discussed in this issue before. https://github.com/onnx/models/issues/90. I think there was no good support for GEMM in ONNX when these models were created. ONNX do have some missing operator and are usually mapped to the closest operator in the source framework.

As far as I know, support for GEMM in ONNX-MXNet is either work in progress or has been done. I will post new model if the support has been added.

snnn commented 5 years ago

Hi @ankkhedia , do you have an estimated time of completion?

ankkhedia commented 5 years ago

@snnn I will have to check with ONNX-MXNet converter team to be able to give a clear ETA. I will update you on the same. If the support has not been added, then it depends upon their roadmap on when the support will be complete. The team is working actively to get rigorous operator coverage.

prasanthpul commented 5 years ago

@ankkhedia I think your last comment is about the other models. can you confirm whether arcface model has been fixed? Will you be posting a 1.3 version as well?

snnn commented 5 years ago

The issue was already there 3 months, but we still don't know when it can be fixed? From user experience perspective, ONNX user would think ONNX model zoo is low quality. I suggest we either fix it quickly, or delete the malformed models.

prasanthpul commented 5 years ago

@snnn lets create separate issue for the other models. this issue is only for arcface. for the other models, I agree that if we cannot fix them they should be removed for now.

ankkhedia commented 5 years ago

@snnn @prasanthpul The model has been fixed and updated in the S3. I checked the model structure with Netron and float64 issue is not there anymore.

prasanthpul commented 5 years ago

Thanks @ankkhedia. Looks like only 1.2 (opset7) version is posted. will you be posting 1.3 as well?

snnn commented 5 years ago

Hi @ankkhedia , could please verify it? I got the model from: 'https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.tar.gz'

It is still wrong.

ankkhedia commented 5 years ago

@snnn Sorry for the miss. I uploaded renet100.onnx file. I will change this tar too.

ankkhedia commented 5 years ago

@snnn added the latest tar file.

ryanlai2 commented 5 years ago

Can we fix ArcFace's README.md so that the table to download the model is correct? The download link was changed to download an OpSet8 model.

Currently, there is only one download link for ArcFace and it's labeled as OpSet 7, v1.2.1. However, the link downloads an OpSet8 v1.3 version of the model. https://github.com/onnx/models/tree/master/models/face_recognition/ArcFace

image

ankkhedia commented 5 years ago

updated :)

snnn commented 5 years ago

Hi @ankkhedia , the old issue is fixed, but we get new one.
For the "relu0" node, its inputs has shape of [1, 64, 112, 112] and [64]. There is no broadcast rule can be applied on them.

snnn commented 5 years ago

Hi @ankkhedia , Could you verify issue?

Thanks.

Roshrini commented 5 years ago

Hi @snnn, I verified this issue on my end. We are actively working on both Prelu and Gemm issue mentioned and re-upload the models as early as we can. Thanks for reporting this and sorry for the inconvenience it has caused.

ankkhedia commented 5 years ago

Hi @snnn There are open PR to fix the above issues with Prelu and GEMM. I have generated a model after including those fixes https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100_new.onnx Could you please let me know if this model looks good to you.

We will update the model once the PR are merged.

snnn commented 5 years ago

Hi @ankkhedia , thank you for fixing it. I'm having a vacation, with poor internet connection. I'll ask my colleague for help.

snnn commented 5 years ago

The problem is solved. Thanks!

snnn commented 5 years ago

Hi @ankkhedia , would you please put the new model in https://github.com/onnx/models/tree/master/models/face_recognition/ArcFace ?

ankkhedia commented 5 years ago

@snnn I have updated the model in https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.onnx

Could you please verify?

snnn commented 5 years ago

Hi @ankkhedia https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100.tar.gz is not updated.

ankkhedia commented 5 years ago

My bad. Updating the same

snnn commented 5 years ago

And this https://s3.amazonaws.com/onnx-model-zoo/arcface/resnet100/resnet100-md5.txt ?

ankkhedia commented 5 years ago

@snnn uploaded resnet100.tar.gz and resnet100-md5.txt now.

snnn commented 5 years ago

Perfect. Thanks!

XinyuDu commented 5 years ago

@ankkhedia Hi, How can I convert the arcface mxnet model to onnx model without the float64 error? THX!

luan1412167 commented 4 years ago

@snnn @ankkhedia I get the error. It may be same as your error. Maybe it as https://github.com/onnx/models/issues/91#issuecomment-439139857 Have Any your experiment help me? Thanks 2019-10-08 11:49:13.612837502 [E:onnxruntime:, sequential_executor.cc:165 Execute] Non-zero status code returned while running PRelu node. Name:'relu0' Status Message: /home/luandd/project_company/face_rec/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:329 void onnxruntime::BroadcastIterator::Init(int64_t, int64_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 64 by 112

luan1412167 commented 4 years ago

@snnn @ankkhedia have you right model with spatial=1?

sky186 commented 4 years ago

@ankkhedia hello , arcface mxnet to onnx canbe fixed? how to convert onnx ,is right? the prelu out not right? because Iwant to convert caffe,but the onnx can be export but is not right?

sky186 commented 4 years ago

@luan1412167 hi, now youcan convert mxnet arcface to onnx right ? I fix ,but export model prelu out not right,not to equal mxnet,could you tell me how to convert onnx right ?

HoangTienDuc commented 4 years ago

Hi @ankkhedia , the old issue is fixed, but we get new one. For the "relu0" node, its inputs has shape of [1, 64, 112, 112] and [64]. There is no broadcast rule can be applied on them.

hi @ankkhedia @snnn i also try to convert arcface LResNet100E-IR mxnet to onnx by using convert_onnx.py. Then, it seem that, i got the same error with @snnn when i deploy my model.

onnx runtime error 1: Non-zero status code returned while running PRelu node. Name:'relu0' Status Message: relu0: right operand cannot broadcast on dim 0 LeftShape: {1,64,112,112}, RightShape: {64}

Can you guide me how to fix it? Thank all off u.

snnn commented 4 years ago

see https://github.com/apache/incubator-mxnet/pull/17711

@vinitra is fixing it.