r2dt-bio / R2DT

Visualise RNA secondary structure in consistent, reproducible and recognisable layouts
https://r2dt.bio
Apache License 2.0
65 stars 11 forks source link

Initial version of image comparison instead of file comparison #106

Closed anayden closed 1 year ago

anayden commented 1 year ago

Some images are indeed different, so not 100% of the tests pass, but this is expected.

AntonPetrov commented 1 year ago

@anayden πŸ‘ Very cool stuff!

To test the new tests I checked out the match_svg_via_images branch, modified the Dockerfile to use FROM rnacentral/r2dt-base:match_svg_via_images, then built the image for linux/amd64 and ran tests with r2dt.py test on the 1.4 library:

docker run --platform=linux/amd64 -it -v ./:/rna/r2dt/ -v path/to/1.4/:/rna/r2dt/data/cms rnacentral/r2dt:match_svg_via_images
root@bab12df6e66b:/rna# r2dt.py test
...
Ran 22 tests in 1105.495s
FAILED (failures=10)

Hopefully this what the right thing to do 🀞

Lots of tests failed which is expected, but to fully understand the new approach I wanted to use file TestSingleEntry_0.html as an example and confirm whether it is supposed to be there or not? Or does its presence have something to do with the requirement for the file sizes to be the same?

I would've thought that these 2 files should be considered similar enough ℒ️ not to fail the test:

Screenshot 2023-09-03 at 00 58 52

I am curious to see an example that used to fail but does not fail anymore. To check this I am going to re-run on develop and compare the number of html files to see which images are not failing the tests on this PR branch. Or I could also change tests.py and run both filecmp.cmp(new_file, reference_file) and self._are_identical(reference_file, new_file) and report the differences, but I figured that you probably did something similar?

Once again thanks for your titanic efforts! πŸ’ͺπŸ’»

anayden commented 1 year ago

@AntonPetrov thanks for the comments and review!

You've done the testing correctly process-wise.

I would've thought that these 2 files should be considered similar enough ℒ️ not to fail the test:

Somehow I didn't see this exact pair of images in my local testing. Could you share the filename? Either way, if 2 images have different sizes in pixels (by any reason), we can't compare them in a meaningful automated way. The only option would be to leverage computer vision and ML, but that can easily will make the tests slower than the app itself β€”Β and it's definitely way beyond my knowledge of image comparison algorithms. Basically, if sizes are different β†’ we assume the images are different.


In order to simplify visual testing and comparison, I've made few changes in the process.

  1. From now on, HTML files will be generated for all tests, not just failing, so one could compare all cases.
  2. You'll now be able to overlay one image on top of the other (open the webpage in browser and use the checkbox + 2 sliders)
  3. All HTML pages now have comparison results printed. Sizes, bbox (not used in the current algorithm, just for the reference β€” it shows the size of differences) and similarity index (only present if image sizes are the same). The algorithm of comparison is the following: if sizes are different β†’ images are different; if sizes are the same β†’ similarity index should be > 0.99. Once again, bbox is not used and is there just for reference.

Using this information, feel free to adjust the similarity threshold if needed.

Also, looking forward to more feedback w/r/t test correctness.