Add testing infrastructure for checking HTML

JasonGrace2282 commented 4 months ago

Pretty self-explanatory, see #41

Todo

[X] Example test
[X] Doctests
[X] Tag matching
[X] Partial attribute matching
[X] Text search
[ ] href search
[ ] Cleanup

JasonGrace2282 commented 4 months ago

Are you basing this code off of an existing implementation? If not, is there another implementation of HTML testing that you're aware of and hopefully took a look at?

I am not basing my code off an existing implementation, and I could not find another good example of HTML testing (beautiful soup and django were two examples I found of HTML parsing, but not testing html).

Also, is there a package that would do this for you? What's the justification for doing it yourself?

This is mostly because I want a higher level of abstraction: instead of looking for <a href=...>... I would much rather just say 'look for a button with this text' and have that done behind the scenes. That being said, I'm considering a library like bs4 to deal with parsing and searching for HTML elements. Even if we were to use a library (and I haven't found one that i really liked yet), I would have to build on top of that to produce something like what I want.

This code doesn't look too efficient - do you have any time/efficiency concerns, and are there any potential improvements you're looking at?

Disclaimer - I have not yet benchmarked anything. Everything I say is informally tested.

I don't think performance is an issue - the parsing is done by the python stdlib, and the average html file is probably going to be around 200-500 elements. Searching through that small of a list in Python is still fast - around 2e-2 seconds from my informal testing. That being said, I would assume any library that we end up using, if any, would also provide some sort of caching mechanism that's less primitive than the one in this PR currently (caching can't happen because everything is in a flat list).

JasonGrace2282 commented 3 months ago

I think this might be easier to implement from scratch, I'm feeling less like the current solution is the best one.

tjcsl / tin

Add testing infrastructure for checking HTML #54

Todo