Closed pushpalatha1405 closed 3 years ago
For your first question, I'm sure it's a bug and we'll fix it soon. But it's actually a function replaceable by ocr.py
(See docs). You can use it to visualize your model's output as well.
It seems you have made your own script ocr_kie_config_test.py
. Did you specify the pretrained
somewhere in your script or config (even to None
)? You need to remove it. It's depreciated and now it should be specified in your model config by like
model = dict(
...
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
)
, if you want to initialize your model in this way.
Thanks Tong, for providing the solution, I will try to implement in the way u suggested.
Regards, Pushpalatha M
From: Tong Gao @.> Sent: Monday, September 6, 2021 9:28 PM To: open-mmlab/mmocr @.> Cc: M, Pushpalatha @.>; Author @.> Subject: Re: [open-mmlab/mmocr] TypeError: 'DataContainer' object is not subscriptable/TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained' (#481)
EXTERNAL SENDER: Do not click any links or open any attachments unless you trust the sender and know the content is safe. EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez l'assurance que le contenu provient d'une source sûre.
For your first question, I'm sure it's a bug and we'll fix it soon. But it's actually a function replaceable by ocr.py (See docshttps://urldefense.com/v3/__https:/mmocr.readthedocs.io/en/latest/demo.html__;!!AaIhyw!7dw5ChjGCDYiTkFLA680j6B5NzArxCCFvCNr-aHfGsIIKExMydCg0tKz8V7YHoie$). You can use it to visualize your model's output as well.
It seems you have made your own script ocr_kie_config_test.py. Did you specify the pretrained somewhere in your script or config (even to None)? You need to remove it. It's depreciated and now it should be specified in your model config by like
model = dict(
...
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet18'),
)
, if you want to initialize your model in this way.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/open-mmlab/mmocr/issues/481*issuecomment-913743287__;Iw!!AaIhyw!7dw5ChjGCDYiTkFLA680j6B5NzArxCCFvCNr-aHfGsIIKExMydCg0tKz8bFs0ncj$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AVKACYBKCDXKBVKE6RLTVPLUATQIRANCNFSM5DQUJ2KQ__;!!AaIhyw!7dw5ChjGCDYiTkFLA680j6B5NzArxCCFvCNr-aHfGsIIKExMydCg0tKz8YrhsSqK$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!AaIhyw!7dw5ChjGCDYiTkFLA680j6B5NzArxCCFvCNr-aHfGsIIKExMydCg0tKz8VmfFsDW$ or Androidhttps://urldefense.com/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!AaIhyw!7dw5ChjGCDYiTkFLA680j6B5NzArxCCFvCNr-aHfGsIIKExMydCg0tKz8S65JHcl$.
Thanks tong for the solution. I will implement inference in the ways u suggested.
Hi Tong iam restating issue here as it is relevant to inference
Thanks Thong i could resolve the above error and i can train my custom dataset (Pls Note as of now, i trained my custom dataset only for few epochs)
Please help me to resolve how to test/infer as iam stuck here ,Now i need to test,so i used below ways:
a)I used the available mmocr test.py script as below and saved the output in output.pkl(Is there any way to infer from this output.pkl)
python /disk2/mmocr/tools/test.py /disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_min_DS.py sdmgr/latest.pth --out /disk2/mmocr/output.pkl
(other testing options leads to error or doesnt work)
b)I used ocr.py for inferencing as below
python mmocr/utils/ocr.py /disk2/mmocr/tests/data/invoice_DS/imgs/test/invoice1.jpg --kie-config /disk2/mmocr/configs/sdmgr_unet16_60e_wildinvoice.py --kie-ckpt /disk2/mmocr/sdmgr_invoice/epoch5.pth --device cuda:0 --output /disk2/mmocr/sdmgr_invoice
i will send u the generated output image file, there i could see only text detection and recognition(should i train on more images then is it possible to see KIE model output?)
c) i made modifications in the config file in the model dict section adding init_cfg key as below:
model = dict(
type='SDMGR',
backbone=dict(type='UNet', base_channels=16),
bbox_head=dict(
type='SDMGRHead', visual_dim=16, num_chars=92, num_classes=14),
visual_modality=True,
#init_cfg=dict(type='Pretrained', checkpoint='/disk2/mmocr/config/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_20210520-7489e6de.pth'),
init_cfg=dict(type='Pretrained', checkpoint='/disk2/mmocr/config/kie/sdmgr/epoch_5.pth'),
train_cfg=None,
test_cfg=None,
class_list=f'{data_root}/class_list.txt')
BUT STILL WHEN I USE THE API mmocr.apis.init_detector(cfg, checkpoint1, device="cuda:0")
, i get BELOW ERROR:
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
TypeError: init() got an unexpected keyword argument 'pretrained'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "inference.py", line 8, in
model = init_detector(cfg, checkpoint1, device="cuda:0")
File "/disk2/mmocr/mmocr/apis/inference.py", line 40, in init_detector
model = build_detector(config.model, test_cfg=config.get('test_cfg'))
File "/disk2/mmocr/mmocr/models/builder.py", line 140, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained'
mmcv.load()
to load your results stored in .pkl
format. However, they are stored in raw text format and that's probably not what you need.ocr.py
cannot run KIE alone on your custom dataset. It only runs KIE on the OCR result generated by text detection and recognition models on your provided images. You still need to use test.py
to test on your custom dataset, and get the visualization with --show-dir
specified.--kie SDMGR
so --kie-config
and --kie-ckpt
were ignored. I think I should improve the weird logic here as well.Thanks Tong for your reply and solution outlining.
So i will continue training using MMOCR SDMGR model for my custom invoice dataset . so probably by that time i could access your script test.py for testing and inferencing my custom dataset. Hoping for best and thanks for your support. i will follow,test and review the changes you make to correct the codes.
Hi Tong ,
I tried using ocr.py to inference as below specyfying kie='SDMGR', along with kie_config and kie_ckpt, But iam getting error as class_list is empty.
ocr = MMOCR(det='TextSnake', recog='SAR', kie_config='/disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildinvoice.py',kie= 'SDMGR',kie_ckpt='/disk2/mmocr/sdmgr_invoice/epoch_5.pth',device='cuda:0')
ERROR:
Traceback (most recent call last):
File "ocr_invoice_inference.py", line 9, in
I have modified the class_list.txt according to my custom invoice dataset and set the path in the config model as below: data_root = '/disk2/mmocr/tests/data/invoice_DS' class_list=f'{data_root}/class_list.txt')
Please help me to resolve. I tried tracing the error but is was in vain i could not resolve.
You need to put class_list.txt
to your {data_root}
, which can be found in WildReceipt.
Tong , but my data root is data_root = '/disk2/mmocr/tests/data/invoice_DS' and all the imgs, annotation files(train.txt,test.txt), dict.txt,class_list.txt are in the data_root defined so accordingly i have set the config file.
even when training iam able to access this data_root so what is the solution?
KIE cannot work with an empty class_list.txt
as it contains essential class information. You can check and use the pre-defined one in WildReceipt if you have no idea what it looks like.
You need to annotate your custom dataset according to the mapping in class_list.txt
or it makes no sense at all if KIE is just outputting some class numbers otherwise.
Tong iam sending the class_list.txt which i have created(its not empty) class_list.txt and placed in the data_root= '/disk2/mmocr/tests/data/invoice_DS' any problem in the structure in which i have created the class_list.txt, the key values i have ignored in the model config file.pls let me know
yes i have annotated the custom invoice dataset according to the mapping in class_list.txt.
say for example 1 Date_value; 2 Totalamt_value;3 InvoiceNum_value etc. But when i have converted the annotations to the SDMGR format i have used a script as below. i will paste the code.
import json
with open("invoice_set1.json","r") as f : # from label-studio json-min dump
ls = json.loads(f.read())
print(type(ls))
global_tags = ['date', 'totalamt', 'invoiceno','address','telephoneno','acctno'] #annotated fields key name
annotations = []
for dl in ls :
filename = "./annotate_dl/"+dl["ocr"].split('/')[-1]
#print(filename,type(filename))
labels = dl['label']
transcriptions = dl['transcription']
#print(labels)
#print(transcriptions)
for label,text in zip(labels,transcriptions):
#print(label)
#print(text)
#enumerating extracted labels to integers
if label['labels'][0] == global_tags[0]:
tag = 1#extract labels
elif label['labels'][0] == global_tags[1]:
tag = 2
elif label['labels'][0] == global_tags[2]:
tag = 3
elif label['labels'][0] == global_tags[3]:
tag = 4
elif label['labels'][0] == global_tags[4]:
tag = 5
elif label['labels'][0] == global_tags[5]:
tag = 6
else:
tag = 7
how to solve the error which i posted which says class_list receives zero arguments
Also does class_list.txt contain for 1 Date_value but i have annotated and named the key as date, is this a mistake.sorry iam troubling you just need to understand where iam going wrong.
I think you just need to remove the empty lines at the end of class_list.txt
.
Thanks Tong i was able to test using ocr.py. hopefully if soon i could access test.py(as of now is has bug) it would be great to test my custom dataset using mmocr SDMGR model(a great model).
You can check the PR for the change and apply the change locally before it gets fully reviewed. I'm pretty sure the patch woks though it might not be an elegant solution
Hi Tong,
I checked the PR for changes and locally changed the files mmocr/apis/inference.py,configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt.py and mmocr/tool/test.py.
Then i ran the below script, which runs without any error but nothing is stored in --show-dir /disk2/mmocr/invoice_output
python /disk2/mmocr/tools/test.py /disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildinvoice.py /disk2/mmocr/sdmgr_invoice/epoch_20.pth --show-dir /disk2/mmocr/invoice_output Use load_from_local loader [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5/5, 3.8 task/s, elapsed: 1s, ETA:
But the folder /disk2/mmocr/invoice_output is empty, no images are saved.
Pls help me , am I going wrong anywhere.
Also i used the below script to inference. Iam sharing the script, but when i run python inference.py i get below error at time of model_inference,sorry i could not locate pipelines. If u can give me any hint on this also i would correct it.
inference.py:
from mmocr.apis import init_detector, model_inference
img = '/disk2/mmocr/tests/data/invoice_DS/imgs/test/invoice1.jpg'
# checkpoint = "/disk2/mmocr/configs/kie/sdmgr/epoch_5.pth"
checkpoint1 = "/disk2/mmocr/invoice_output/epoch_20.pth"
out_file = '/disk2/mmocr/sdmgr_invoice/invoice1.jpg'
cfg = '/disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildinvoice.py'
model = init_detector(cfg, checkpoint1, device="cuda:0")
print(model)
result = model_inference(model, img)
print(f'result: {result}')
#img = model.show_result(
#img, result, out_file=out_file, show=False)
#mmcv.imwrite(img, out_file)
ERROR:
Traceback (most recent call last):
File "inference.py", line 12, in <module>
result = model_inference(model, img)
File "/disk2/mmocr/mmocr/apis/inference.py", line 127, in model_inference
test_pipeline = Compose(cfg.data.test.pipeline)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 22, in __init__
transform = build_from_cfg(transform, PIPELINES)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 46, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'KIEFormatBundle is not in the pipeline registry'
I'm not sure, have you tested your model with the --eval macro_f1
option and see if it produces a reasonable score? (and you should use print
to check if your model has produced any result on your input image)
For your second question, you just need to add a line at the beginning of your file:
import mmocr.datasets.pipelines
So that all the pipelines will be registered in the registry. It was implicitly done in other scripts.
Also, next time please use fenced code block syntax to improve the readability of your code snippet. I've edited your response otherwise it would be quite messy.
Hi Tong, iam sorry for pasting shabby. definitely i will follow the fenced code block syntax or upload file.
a) iam able to test my custom dataset with the --eval macro_f1 option and it produced a reasonable score. But still iam not able to use the option --show-dir as it does not produce any output files , the created directory remain empty.
b) You suggested to print to check if the model has output image,so pls can u guide how to do this activity.
c)Iam able to solve the pipeline error, but i have different error which i will be posting and also sending the script which iam trying to train and infer. Pls provide me the solution, almost if can view an output image then i have use mmocr SDMGR on huge of our custom dataset, which is pending.pls help me with the solution.
sorry i tried to correct the error but i do not know the ann dict format that need to be an input to model_inference()
ERROR:
Traceback (most recent call last):
File "inference1.py", line 51, in <module>
result = model_inference(model,img)
File "/disk2/mmocr/mmocr/apis/inference.py", line 144, in model_inference
data = test_pipeline(data)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/data_
data = t(data)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/data__
results = self._load_bboxes(results)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/databboxes
results['gt_bboxes'] = ann_info['bboxes'].copy()
TypeError: 'NoneType' object is not subscriptable
I'm not sure what's happening and I'm not able to reproduce it locally. You can add a print statement before https://github.com/open-mmlab/mmocr/blob/37a41637f1f309101f35efdcebcc153b1779d56f/tools/test.py#L142 amd check the values in result
and gt_bboxes
. You may also use pdb
for debugging.
For c), as I've said before, SDMGR model can never run without annotated text boxes provided by either gold annotations or upstream OCR models. If you want to test it on images without gold text annotations, please use ocr.py
that combines text detection, recognition and KIE models in a pipeline. But note that the mistakes made by upstream models would also affect the final output.
Thanks tong i will print and check thevalues in result and gt_bboxes .
for c) i did not get your answer sorry because iam annotating the custom dataset, converted to annotation form acceptable by SDMGR model, iam performing training, testing. now why i cannot infer using these api model_inference(). if i use ocr.py then pretrained text detection model identifies every text region which iam not interested. Then can i should my custom dataset using text detection model ,converting the annotation to suite the textdetection model format ,then use SAR(text recognition pretrained model as it is) and any how i have trained SDMGR model on my custom dataset , so then try ocr.py which would give appropriate results. what is suggestion on this?
sorry bit spell mistake ,i mean train my custom dataset using text detection model using the annotations i create.
A brief summary: If you want SDMGR to infer on images + gold text annotations, use test.py
. If you only have images without gold text annotations, use ocr.py
.
So it seems that you don't need to use ocr.py
because you already got the gold annotations for texts. And the fact that you are creating inference.py
whose functionality is a subset of test.py
actually confused me. I believe what you need is all inside test.py
. Even if you just want to re-implement one with some customized utilities, you should also refer to test.py
's implementation, which should have your question in c) answered.
Thanks for the solution Tong and all your continuous help providing solution.
This is the sample image inference which i have printed .iam attaching the file pic.
Does continuing training on large set of custome images ,would refine the bboxes during image inference ,as i see lot of clumsy bboxes?
Is it the result of running SDMGR in ocr.py
or with your own annotation? I think it's unusual for pretrained text detection models to generate these clumsy bboxes. If you were running ocr.py
, try another detection model and see if this issue persists.
Thanks verymuch Tong for all your support.
Pushpalatha M
Checklist
Describe the bug I have trained the SDMGR model for 10 images and now Testing the SDMGR model to Visualize the Predictions.But iam getting Type error ypeError: 'DataContainer' object is not subscriptable .Please help me to resolve.
Reproduction
I ran the following command :
python /disk2/mmocr/tools/test.py /disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_min_DS.py /disk2/mmocr/sdmgr/latest.pth --show-dir /disk2/mmocr/mmocr_kie_output Use load_from_local loader [ ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last): File "/disk2/mmocr/tools/test.py", line 243, in
main()
File "/disk2/mmocr/tools/test.py", line 213, in main
args.show_score_thr)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/apis/test.py", line 31, in single_gpu_test
if batch_size == 1 and isinstance(data['img'][0], torch.Tensor):
TypeError: 'DataContainer' object is not subscriptable
Also iam using script as below if i run the script i get the error TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "ocr_kie_config_test.py", line 13, in
model = init_detector(cfg, checkpoint, device="cuda:0")
File "/disk2/mmocr/mmocr/apis/inference.py", line 40, in init_detector
model = build_detector(config.model, test_cfg=config.get('test_cfg'))
File "/disk2/mmocr/mmocr/models/builder.py", line 140, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained'
Environment
python mmocr/utils/collect_env.py
to collect necessary environment information and paste it here. sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0: Tesla K80 CUDA_HOME: /usr/local/cuda-10.1 NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:TorchVision: 0.7.0 OpenCV: 4.2.0 MMCV: 1.3.8 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.3.0+76c9570