ivelin / donut_ui_refexp

Fine tuning Donut transformers for UI Referring Expressions task
Apache License 2.0
7 stars 2 forks source link

Image Processing Verification #2

Open morganmcg1 opened 1 year ago

morganmcg1 commented 1 year ago

Here is my notebook to log the image processing pipeline

https://github.com/morganmcg1/control-ui/blob/main/verify_image_pipeline.ipynb

It'll log a table with all 15k images to this wandb run page, with the original image, the processed image, and the processed image without normalisation (just to view the impact of the other transforms): https://wandb.ai/ui-control/ui-control/runs/5la94xgq

ivelin commented 1 year ago

That's very cool. Thank you for sharing. Really curious to see the difference in performance with your image processing pipeline.

BTW, looks like the ui-control wandb url is private. I get a 404.

morganmcg1 commented 1 year ago

Ah sorry, made it public, popped the table into a Report too

https://wandb.ai/ui-control/ui-control/reports/Image-Processing-Verification--VmlldzozNjUzNzc4

ivelin commented 1 year ago

Ah sorry, made it public, popped the table into a Report too

https://wandb.ai/ui-control/ui-control/reports/Image-Processing-Verification--VmlldzozNjUzNzc4

That's a really cool view! Thank you for putting the effort and publishing. Hope it helps get a few more eyes on this task.

A few follow up questions:

  1. Do you think augmentations that change colors may hurt training? Some of the referring expressions call out components by their color.
  2. What do you think about shape augmentations? Resizing, stretching, slight rotations? Could these help improve spacial reasoning and reduce dependency on Android screen format and UI patterns? There is a fair amount of research on augmentation for natural images, OCR and scene text recognition. I experimented a little bit but don't have conclusive results yet.
  3. Have you been able to fine tune the model further and compare vs current version?
morganmcg1 commented 1 year ago

Hey, getting back to this after a week.

  1. Its a good point, worth testing experimentally, but yep maybe ligher color augmentations might be the way
  2. Yep 100% worth trying, agreed that the model would benefit from non-mobile screen aspect ratios
  3. No, no yet, hoping to do some basic verification of the training pipeline (over-fitting on a tiny train set etc) to ensure its all training well, before starting some larger runs
ivelin commented 1 year ago

Awesome. Thank you for the updates!