Closed jknndy closed 4 weeks ago
Hi @jknndy
What is the thinking behind the order you've proposed? Is it ordered by keys for mandatory tests, then keys for optional tests?
Is this something we can use a pre-commit hook to enforce? The current hook we have for formatting json has --no-sort-keys
set, but nothing to enforce a particular order.
As a code reviewer, I'd find reliable ordering of the JSON files useful; there tend to be certain fields that I check more rigorously than others.
As a developer, it would be nice if the templated JSON produced by generate.py
used the same key ordering, so that I wouldn't have to know/remember to reformat the JSON file after entering the expected test results.
What is the thinking behind the order you've proposed? Is it ordered by keys for mandatory tests, then keys for optional tests?
No particular order, I structured in a way that seemed like it'd be most consumable. Grouping related outputs & moving things that return the most relevant info towards the start.
Is this something we can use a pre-commit hook to enforce? The current hook we have for formatting json has
--no-sort-keys
set, but nothing to enforce a particular order.
I would think so, I didn't test anything here but can't see why it couldn't be implemented
As a code reviewer, I'd find reliable ordering of the JSON files useful; there tend to be certain fields that I check more rigorously than others.
Exactly my thought, was getting harder & harder to quickly spot certain fields as previously missed coverages were appended to the bottom.
As a developer, it would be nice if the templated JSON produced by
generate.py
used the same key ordering, so that I wouldn't have to know/remember to reformat the JSON file after entering the expected test results.
I am planning to correct this once an order is finalized
@jknndy I found it difficult to write down a recommended ordering for the field list until first categorizing the fields into a few groups.
I also found it difficult to determine whether fields are mandatory in the sense of scraper-should-implement-them as compared to my understanding of a legal definition of a recipe (which doesn't include images or yields for example -- but in practice I think these are valuable for many users).
Here's what I've got so far:
Implementation Details
* host (required)
Recipe Publication Details
* canonical_url (required)
* site_name (required)
* author (required)
* language (required)
Recipe Contents
* title (required)
* ingredients (required)
* ingredient_groups
* instructions (required)
* total_time (strongly-recommended)
* prep_time
* cook_time
* yields (strongly-recommended)
Recipe Metadata (all optional)
* image (strongly-recommended)
* description
* nutrients
* cuisine
* category
* keywords
* equipment
* cooking_method
* ratings
* ratings_count
Nested fields indicate that the top-level field is required when the nested field is implemented.
I've omitted ingredients_list
because I think of it as an implementation quirk/detail.
This looks good to me, I forgot to include dietary_restriction in my original list so I'll add it into yours above keywords.
Are we good to move forward with this list?
No objections from me!
@jayaddison / @strangetom , thoughts on pre-populating all possible fields in the json files now and letting the developer remove the coverages that aren't available?
I'd be in favour of that. It makes in clearer what the possible fields are and if the developer accidentally left a field in that the scraper didn't implement, there would be an exception when the tests are run.
Mostly-agree; one possible counter-argument: it could confuse some developers by making them think that they have to enter data and then implement all of the json
fields. Would we have a way to make it clear that some can be removed?
An option would be to catch the exception raised if a scraper doesn't implement one of the json keys and output a message saying that the entry should be removed from the testjson if the scraper doesn't implement that function.
for key in OPTIONAL_TESTS:
with self.subTest(key):
scraper_func = getattr(actual, key)
if key in expect.keys():
try:
self.assertEqual(
expect[key],
scraper_func(),
msg=f"The actual value for .{key}() did not match the expected value.",
)
except Exception as e:
print(f"If the scraper does not implement .{key}(), then remove the entry from the .testjson file.")
raise e
Do we have a way to check and/or auto-reorder newly added JSON test data? (it's OK if not to begin with, although I expect we might want those at some point if we don't already)
Opened as a result of recently opened & merged #1123
Reorders all test json files to fit the following order, completely automated so open to suggestions in order!
Will update the json auto creation and any docs (if applicable) once an order is finalized
cc. @mlduff related to past discussion