DIT Text Detection Inference

I would like to perform a simple inference from the dit model for the text detection you give, and an input image

The readme of this component only details how to do fine-tuning or evaluation. With the idea of being able to make an inference, I have based myself on the evaluation example, I have downloaded some of the models that you give and I have tried to run the evaluation like this:

$ python train_net.py --config-file configs/mask_rcnn_dit_base.yaml --eval-only --resume MODEL.WEIGHTS ../../.models/funsd_dit-b_mrcnn.zip OUTPUT_DIR output

The problem is that I don't know how to insert the funsd dataset to make it work, can you give me some advice on the subject? Is there any other more straightforward way to make a simple inference on any image?

microsoft / unilm

DIT Text Detection Inference #1154