Open geroldmeisinger opened 1 month ago
CogVLM2-4bit, 512 tokens, no sampling
notable mentions: landscape_noise: The image has a grainy texture, which could be due to the quality of the camera used or the resolution of the image. There are no visible artifacts in the image that would indicate corruption or damage to the photo. landscape_reflect: The symmetry created by the reflection adds a dynamic element to the photograph, making it visually appealing., The image depicts a group of individuals sitting on a sandy beach. The photo is taken from a slightly elevated angle, capturing the group's reflection in the water below. The reflection in both the sand's surface and the calm water below is clear and undisturbed, indicating a lack of wind or other disturbances at the time the photo was taken. The symmetry of the reflection adds a sense of balance and harmony to the composition. landscape_replicate: focusing on the subjects in the foreground while softly blurring the background elements. (does it mean the depth-of-field or the replication?)
portrait_noise: 5. Artifacts: There is a noticeable artifact in the form of a grainy texture overlaying the image. This could be a result of the image being compressed or saved at a lower resolution, which can affect the clarity and detail of the photograph. 9. Details: Despite the grainy artifact, portrait_reflect: This image is a mirrored, symmetrical representation of a crowd of people in an urban setting. The image appears to be a mirrored composite of two different photographs, creating a mirrored effect across the center... portrait_replicate: The image has been heavily distorted with colorful, blurred, and overlapping stripes, which appear to be digital artifacts. These artifacts obscure the details of the image and make it difficult to discern the original content.
=> don't use noise, reflect and replicate plain colors seems to be ignored, which is good.
extra prompt on black:
is there a border in the image and what does it look like? what color is it? The image does not have a traditional border that frames the photo. Instead, it appears to be a part of a larger image or a screenshot from a digital platform, as indicated by the visible interface elements at the top and bottom of the image. The top shows icons that resemble social media sharing options and a timestamp, while the bottom has a navigation bar with icons for going back, home, and accessing options or settings. The color of these interface elements is primarily white, with some black and gray accents.
hmm...
extra prompt and stretch_and_squish:
what is the shape of their heads? does the shape of their faces look natural? landscape: The heads of the individuals in the image are mostly rounded with a slight forward-facing curve, which is typical for natural human heads. The shape of the faces appears natural, with no noticeable elongation or distortion.
portrait: The heads in the image are mostly rounded with a variety of hair lengths and styles. The faces appear natural, with a mix of different facial structures and features, including a range of nose shapes, mouth sizes, and chin lengths. There doesn't seem to be any noticeable alteration or manipulation to the facial features, maintaining a realistic appearance.
CogVLM2 seems to be trained on squished heads?
landscape_extra_squishy:
The shapes of the faces and the bodies of the individuals in the image are quite regular and natural. There are no noticeable abnormalities or unusual features on their bodies or faces. They appear to be young adults, and their features are typical of that age group.
portrait_extra_squishy:
The shapes of the faces and the bodies of the individuals in the image are quite varied. Some faces are more angular, while others are more rounded. The body types range from slim to more robust. There are no immediately noticeable strange features on the faces or bodies; however, the image is quite crowded, and individual features can be obscured due to the overlapping nature of the people.
added preprocessing functions to settings but only those which don't change content:
@jhc13 I'm done. I only integrated it for CogVLM2 because it's the only model I use and know that uses square images. I cannot integrate it for other models.
I have to admit it is rather difficult for me to come up with real adversarial examples:
I'm so sorry children but it's for science!
from https://commons.wikimedia.org/wiki/File:EPK_komplexes_Beispiel.png
I just noticed some models are using preprocessor_config.json
168
implement different functions (
prepare_img_stretch_and_squish
,prepare_img_scale_and_centercrop
,prepare_img_scale_and_fill
) to prepare square images for the model. functions are not yet integrated in UI.added unit tests for prepare_img functions. call with
python taggui/run_tests.py
which runs the functions onimages/people_landscape.webp
andimages/people_portrait.webp
. images are licensed CC, addedattributions.txt
. I chose images with an aspect ratio of 2 and people as content to make any strange preparations more obvious ("vertically flipped faces?", "oval faces?", "double face reflections?", "extended feet?")added
opencv-contrib-python
as requirement (used inscale_and_fill
). sooner or later we will need it anyway. I strongly argue for using opencv-contrib (contrib + non-headless) as there is a known issue with opencv to be incompatible with any other module previously installed in parallel and this version is the superset of all opencv modules.TODO: add examples with abstract objects (like diagram)
Landscape example
original:
stretch and squish:![people_landscape_stretch_and_squish](https://github.com/jhc13/taggui/assets/112266044/079cd2eb-49ba-4bc2-a5e6-5bd218120050)
center crop:![people_landscape_scale_and_centercrop](https://github.com/jhc13/taggui/assets/112266044/f47ee82b-fdb7-482b-bea0-b65caa74c2cf)
fill gray:![people_landscape_gray](https://github.com/jhc13/taggui/assets/112266044/8c5c35e4-2ec7-45b7-9681-76a77638faa8)
fill noise:![people_landscape_noise](https://github.com/jhc13/taggui/assets/112266044/3278db8f-05d0-4db8-bf65-553b014fd840)
fill reflect:![people_landscape_reflect](https://github.com/jhc13/taggui/assets/112266044/a5885b9d-6c8d-4c36-a448-a1074c7949be)
fill replicate:![people_landscape_replicate](https://github.com/jhc13/taggui/assets/112266044/9b54cf66-5572-4a54-84a2-d233391622de)
Portrait Example
original:![people_portrait](https://github.com/jhc13/taggui/assets/112266044/4487345a-7b27-4fb5-a5ba-8a9d6c7bff56)
stretch and squish:![people_portrait_stretch_and_squish](https://github.com/jhc13/taggui/assets/112266044/095301ee-3e03-4eb6-aa92-2c4a0aa84ae6)
center crop:![people_portrait_scale_and_centercrop](https://github.com/jhc13/taggui/assets/112266044/b1871a57-4854-4989-9c53-8840438c9da0)
fill gray:![people_portrait_gray](https://github.com/jhc13/taggui/assets/112266044/6732e57c-7ae4-4f77-a0ee-c1d3356947da)
fill noise:![people_portrait_noise](https://github.com/jhc13/taggui/assets/112266044/9f73a581-46bf-425f-9611-4efbc24cb15d)
fill reflect:![people_portrait_reflect](https://github.com/jhc13/taggui/assets/112266044/dcb8f33b-1f55-4fcc-ad75-5e3f1519d1fc)
fill replicate:![people_portrait_replicate](https://github.com/jhc13/taggui/assets/112266044/6b73ca01-8071-436f-b2fe-fc940a078217)