Closed mpadge closed 1 year ago
New code for me, but as I understand, there are two kinds of tables for feature tags: 1) with <div class="taglist"
, and 2) with <table class="wikitable"
.
The tests get a sample of each kind of data to generate the mock call response, dropping the header and all surrounding html code. This MR selects a sample of the rows for the type 2 tables to reduce even more the size of the mock files. Then it tests for buildind
, a type 2) table. I would add a test for a feature in the type 1) tables (eg. aerialway
).
For the row's reduction code, perhaps selecting all nodes but the 10 rows of building data + headers + a sample of type 1) tables, and then using xml2::xml_remove
could work, but I don't get how that work and should be tried. I'll try later
Thanks! And yes, indeed ...
For the row's reduction code, perhaps selecting all nodes but the 10 rows of building data + headers + a sample of type 1) tables, and then using xml2::xml_remove could work, but I don't get how that work and should be tried. I'll try later
That is what should work, but i can only get xml_remove()
to remove everything or nothing, not a sub-set.
Check #324 to remove the hacky part of this PR. Once merged, I think we can merge this one
@jmaspons Could you please help out here by reviewing this PR? It updates a test function i added last year to reduce the sizes of mocked files for the
available_tags()
function.The problem
The function scapes this OSM wiki page, and extracts all data from the tables. The problem is there are a large number of large tables, and the resultant mock file is too big. The old version of the function worked, but the web page has changed structure in the meantime (see #322 ), and so this size-reduction function also needs updating.
What is needed
A way to use
rvest
orxml2
to reduce a HTML document down to a sub-set of all nodes. I don't know how to do that?What I've done
A really hacky modification of the document-as-text-file, by identifying table rows and removing most of them.
What you can do
If you know anything about document manipulation with
rvest
/xml2
, please just have a look at the code to see if you can think of a better approach. If not, that's fine, and we can just merge anyway, because it does actually work in current form. It's just an embarassing approach to systematic manipulation of HTML docs. Thanks!