Open Question406 opened 1 year ago
Thanks for your attention to our work! Could you send us the index (xxxxx_xxxxxxxxx) of samples that contain incorrect annotations? I will check those samples then. The filtering rules are illustrated in Appendix E and no more ocr models are used.
Thanks for your response.
The examples above are randomly sampled; I forgot to keep the index :( But could you check on these examples?
00000_000003058 00000_000004343
The images are downloaded with this command img2dataset --url_list=mario-laion-url.txt --output_folder=laion_ocr --thread_count=64 --image_size=512
following your README.
Thanks for your feedback. It is a mistake and the command should be:
img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no
We will fix it in the readme file. Thanks!
I see. Thanks for your response. Additionally, could you kindly inform me when will Mario-10M be released?
It is hard to say and may take some time. Please stay tuned ;D
It is hard to say and may take some time. Please stay tuned ;D
hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?
Thanks for your feedback. It is a mistake and the command should be:
img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no
We will fix it in the readme file. Thanks!
Hi,
So, are the provided OCR labels measured in original resolution of the image?
Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen
We notice that there exist a few samples with mismatched annotations caused by the resize operation during releasing the datasets. We will fix it within one week. If you want to use this dataset urgently, please use the given character-level segmenter to check whether the result match the provided segmentation results.
Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen
The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!
@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?
@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?
Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks
The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.
@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?
Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks
The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.
@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?
Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks
Can I use the previously downloaded files with just clipping? or should i redownload them?
It is recommended to re-download it. Thanks!
Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen
The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!
Thank you ! i'll give a try!
Hi! I have something to confirm. After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ? For the more , is the operation of resize in the preprocess_train function in train.py? @JingyeChen
After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ?
Yes, and you need to use resize each image to 512x512 during "putting the maion_laion images into the corresponding directorys" or perhaps add image = image.resize((512,512))
in train.py
I get it!Thank you a lot, Senior Chen!
It is hard to say and may take some time. Please stay tuned ;D
hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?
Have u got the training code for Layouttransformer?
Thanks for your feedback. It is a mistake and the command should be:
img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no
We will fix it in the readme file. Thanks!
Why can't we use --resize_mode=512 directly, while must like the operation you mentioned that "after downloading, you need to resize each image to 512x512. Please follow mario-laion-index-url.txt to move each image to the corresponding folders."
By the way, could you please offer a script to "resize each image to 512x512 and move each image to the corresponding folders."
Hi,
I found some wrong annotations with misplaced bounding boxes in Mario-LAION after I downloaded the images in the url.txt.
Here are several examples:
Are these the examples with incorrect annotations, as mentioned in your paper, or do my downloaded images differ from the ones you used for the annotations? I want to check if my data processing is correct. And could you share more about the filtering? Are you using another ocr model to check if each bounding box contains the annotated text?
Thanks.