isogr / register-system-transition

Covers GR system transition from 2013 Java-based version to 2023 static site based version
1 stars 0 forks source link

Test data post-migration #30

Open strogonoff opened 5 months ago

strogonoff commented 5 months ago

The goals are:

cc @stefanomunarini, @ribose-jeffreylau if anyone has ideas.

I’ve been researching various snapshot testing (+ optional visual comparison) approaches (e.g., with playwright), but I wonder if that’s overkill.

ribose-jeffreylau commented 5 months ago

Currently, PDF files downloaded from the Java-based system contains retrieval timestamps. Just a small bit of detail to be aware of.

ribose-jeffreylau commented 5 months ago

Questions regarding snapshot (which I'll call screenshot from now on, to avoid potential confusion) testing:

Currently, the item class index pages you see in the Java-based (and hence also in the static snapshot) website are all rendered with a data JSON file behind the scene. In the snapshot directory, you will be able to find various JSON files called *containedItem. These files may help in determining differences in number of items in each item class.

For individual item pages, may need to extract just the text from the relevant elements and go from there. If we want more semantics from the pages though, I believe we can use the fact that most fields are enclosed in <p id="...fieldName..."> tags which would help with initial data cleaning.

E.g., <input disabled="disabled" id="domainOfValidity.geographicBoundingBoxes0.northBoundLatitude" name="domainOfValidity.geographicBoundingBoxes[0].northBoundLatitude" value="62...

stefanomunarini commented 5 months ago

I’ve been researching various snapshot testing (+ optional visual comparison) approaches (e.g., with playwright), but I wonder if that’s overkill.

How were you thinking to achieve this @strogonoff ? Would this include exporting a PDF from Paneron and comparing the two? (I cannot see if this functionality is already in place, cause at the moment I cannot access https://geodetic-v2.isotc211.org)

stefanomunarini commented 5 months ago

Questions regarding snapshot (which I'll call screenshot from now on, to avoid potential confusion) testing:

  • Is achieving pixel perfect necessary for screenshot testing to work?

Currently, the item class index pages you see in the Java-based (and hence also in the static snapshot) website are all rendered with a data JSON file behind the scene. In the snapshot directory, you will be able to find various JSON files called *containedItem. These files may help in determining differences in number of items in each item class.

This is great @ribose-jeffreylau , we could use part of this data (e.g. management proposal info) to remove some overhead while retrieving data for individual item pages using HTML selectors

ribose-jeffreylau commented 5 months ago

I have neutralized the PDF retrieval timestamps to 1970-01-0T00:00:00 which may ease PDF comparison...

stefanomunarini commented 5 months ago

I’ve been researching various snapshot testing (+ optional visual comparison) approaches (e.g., with playwright), but I wonder if that’s overkill.

Are PDFs in this folder generated by the new system? If so, the solution you proposed above might be working to compare data correctness @strogonoff

However, we could also implement a solution using Python and PyMuPDF which would envolve extracting the text from both documents and comparing them.

Sorry I should document myself better before asking questions. My previous question still stands tho, do we have PDF export feature in the new system? For some reasons I keep getting an error when accessing https://geodetic-v2.isotc211.org/

ribose-jeffreylau commented 5 months ago

@stefanomunarini There is not yet a PDF export feature in the new system.

BTW what errors are you getting when accessing v2 site?

stefanomunarini commented 5 months ago

BTW what errors are you getting when accessing v2 site?

Screenshot 2024-02-06 at 10 54 23
ribose-jeffreylau commented 5 months ago

@stefanomunarini Do you have Safari v16? It should work in v17. Or try other browsers.

stefanomunarini commented 5 months ago

@stefanomunarini Do you have Safari v16? It should work in v17. Or try other browsers.

Thanks @ribose-jeffreylau . It works in Safari v16.6

strogonoff commented 5 months ago

@stefanomunarini Do you have Safari v16? It should work in v17. Or try other browsers.

Yes, it also works with year old Firefox and Chrome. Only Safari has to be new?