Open ip2016 opened 2 months ago
@sovrasov Who is it appropriate to assign this to? (ARC GPU issue)
@sovrasov Who is it appropriate to assign this to? (ARC GPU issue)
I'm not sure that this is ARC GPU specific issue. I'm observing the same error with CPU training/validation/export.
@sovrasov Who is it appropriate to assign this to? (ARC GPU issue)
I'm not sure that this is ARC GPU specific issue. I'm observing the same error with CPU training/validation/export.
You're right it's ARC-specific. otx[xpu] installs a patched torch + IPEX, which messes up output types sometimes. Currently, workaround is to conduct export in a cpu or cuda environment (i.e. use upstream torch).
@sovrasov Who is it appropriate to assign this to? (ARC GPU issue)
I'm not sure that this is ARC GPU specific issue. I'm observing the same error with CPU training/validation/export.
You're right it's ARC-specific. otx[xpu] installs a patched torch + IPEX, which messes up output types sometimes. Currently, workaround is to conduct export in a cpu or cuda environment (i.e. use upstream torch).
Thanks. I'll try it out.
Update: I have different error trying to train on CPU with otx[base] package:
RuntimeError: "nms_kernel" not implemented for 'BFloat16'
Update: I have different error trying to train on CPU with otx[base] package:
RuntimeError: "nms_kernel" not implemented for 'BFloat16'
Training with upstream torch is not required: the checkpoint trained on ARC with IPEX should work in upstream torch as well
I'm trying to train yolox_tiny model on my image dataset with additional single category. Training and testing completes successfully but exporting fails with error "Argument 1 and 2 element types must match." I'm using otx[xpu] extension and ARC 750 GPU for training.
Steps to Reproduce
Environment:
python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.version); print(ipex.version); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
hwinfo --display
clinfo -l