microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.29k stars 2.56k forks source link

TextDiffusers: question about Mario-LAION annotations #1179

Open Question406 opened 1 year ago

Question406 commented 1 year ago

Hi,

I found some wrong annotations with misplaced bounding boxes in Mario-LAION after I downloaded the images in the url.txt.

Here are several examples:

Are these the examples with incorrect annotations, as mentioned in your paper, or do my downloaded images differ from the ones you used for the annotations? I want to check if my data processing is correct. And could you share more about the filtering? Are you using another ocr model to check if each bounding box contains the annotated text?

Thanks.

JingyeChen commented 1 year ago

Thanks for your attention to our work! Could you send us the index (xxxxx_xxxxxxxxx) of samples that contain incorrect annotations? I will check those samples then. The filtering rules are illustrated in Appendix E and no more ocr models are used.

Question406 commented 1 year ago

Thanks for your response.

The examples above are randomly sampled; I forgot to keep the index :( But could you check on these examples?

00000_000003058 image 00000_000004343 image

The images are downloaded with this command img2dataset --url_list=mario-laion-url.txt --output_folder=laion_ocr --thread_count=64 --image_size=512 following your README.

JingyeChen commented 1 year ago

Thanks for your feedback. It is a mistake and the command should be:

img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no

We will fix it in the readme file. Thanks!

Question406 commented 1 year ago

I see. Thanks for your response. Additionally, could you kindly inform me when will Mario-10M be released?

JingyeChen commented 1 year ago

It is hard to say and may take some time. Please stay tuned ;D

crj1998 commented 1 year ago

It is hard to say and may take some time. Please stay tuned ;D

hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?

other-ones commented 1 year ago

Thanks for your feedback. It is a mistake and the command should be:

img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no

We will fix it in the readme file. Thanks!

Hi,

So, are the provided OCR labels measured in original resolution of the image?

RanJason-Code commented 1 year ago

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

JingyeChen commented 1 year ago

We notice that there exist a few samples with mismatched annotations caused by the resize operation during releasing the datasets. We will fix it within one week. If you want to use this dataset urgently, please use the given character-level segmenter to check whether the result match the provided segmentation results.

JingyeChen commented 1 year ago

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!

Question406 commented 1 year ago

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

koow-eat commented 1 year ago

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

JingyeChen commented 1 year ago

The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

koow-eat commented 1 year ago

The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

Can I use the previously downloaded files with just clipping? or should i redownload them?

JingyeChen commented 1 year ago

It is recommended to re-download it. Thanks!

RanJason-Code commented 1 year ago

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!

Thank you ! i'll give a try!

RanJason-Code commented 1 year ago

Hi! I have something to confirm. After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ? For the more , is the operation of resize in the preprocess_train function in train.py? @JingyeChen

JingyeChen commented 1 year ago
After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ?

Yes, and you need to use resize each image to 512x512 during "putting the maion_laion images into the corresponding directorys" or perhaps add image = image.resize((512,512)) in train.py

RanJason-Code commented 1 year ago

I get it!Thank you a lot, Senior Chen!

nkjulia commented 1 year ago

It is hard to say and may take some time. Please stay tuned ;D

hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?

Have u got the training code for Layouttransformer?

Ruby-He commented 6 months ago

Thanks for your feedback. It is a mistake and the command should be:

img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no

We will fix it in the readme file. Thanks!

Why can't we use --resize_mode=512 directly, while must like the operation you mentioned that "after downloading, you need to resize each image to 512x512. Please follow mario-laion-index-url.txt to move each image to the corresponding folders."

By the way, could you please offer a script to "resize each image to 512x512 and move each image to the corresponding folders."