Open Dimitar5555 opened 2 years ago
I've thought about spending time on this , but I really think it would be hard to implement, and the gains would be less than what you might think. Because we fetch these files from a CDN, the file size and speed is a lot faster than people realize:
👇 here's the numbers I see in the Chrome dev tools today:
Put another way, these requests finish faster even than we can fetch a screen of imagery tiles, and actually a lot faster than we fetch a tile of OSM data.
Also, are there any other data consumers that will be affected? If there are, it would be nice to leave both "systems" to coexist and add a folder in
dist
where the new per country method is used.
Tagging @bryceco too, as we've had some conversations about this.. I'm definitely willing to produce data in other formats (like protobuf or sqlite, or chop up into world regions) if it helps.
I did write a script to dump the contents into SQLite, but there wasn’t any savings on total size. I’ll see if I can find the script if you want to double-check my approach.
I've thought about spending time on this , but I really think it would be hard to implement, and the gains would be less than what you might think. Because we fetch these files from a CDN, the file size and speed is a lot faster than people realize:
* it's actually gzipped so the size is significantly smaller than the the original json * the cdn supports http3, so it can pipeline these connections * it's available everywhere worldwide pretty fast
That's great but how are we on the validation front? How long does it take to validate 100 objects for branding when all brands are loaded and when only country specific brands are loaded?
SQLite script is here: https://github.com/bryceco/GoMap/blob/master/src/presets/nsiSqlite.py
That's great but how are we on the validation front? How long does it take to validate 100 objects for branding when all brands are loaded and when only country specific brands are loaded?
I'm not sure how to test this, but all of the data goes into the match index and validation is nearly instant once that index is ready.
There is a roughly 1 second stutter when building the index, this occurs after the files have been fetched and the locations have been resolved. When I tested it just now, it happened about 7 seconds into my editing session. (per #5385, I'd like to find a way to move this work into the background, but it's not too bad currently).
I've thought about having smaller files and less redundant data downloaded to users for a while, as from my perspective I only ever really edit OSM within the UK, so any non-UK based brands aren't going to get used when I edit.
But in this day and age, I don't think 8mb is a huge amount of data as-is, and like Brian has said the use of GZIP, CDN's and HTTP3 makes the file transfer a pretty quick download, as the file itself is only around 1mb.
However, having noticed the delay in being able to use the NSI data within iD (#5669 & #5790) I do think maybe splitting the files might make the initialisation process within iD faster, but then would that saving be lost by the extra processing needed to determine which file should be downloaded in the first place - such as adding user preferences within iD.
I don't really know how iD processes the NSI data and integrates it, but if iD is doing a lot of work after downloading the NSI data, is that processing something we could do on the NSI side first, so iD doesn't have to build a feature set or anything like that?
Another thing I am curious about, is the NSI data in iD one big array / object? So if I type "M" will every entry within the NSI be scanned through to see if the letter "M" matches, and if I then typed "Mc" again would every entry within the NSI be scanned for a match?
However, having noticed the delay in being able to use the NSI data within iD (#5669 & #5790) I do think maybe splitting the files might make the initialisation process within iD faster, but then would that saving be lost by the extra processing needed to determine which file should be downloaded in the first place - such as adding user preferences within iD.
That's why I started this issue (I should've clarified it in the beginning but better late than never). The problem that I'm facing is that when I've chosen some node that lacks brand
tagging in osm.org, then click on edit (to open it in iD). It takes good 30-40 seconds or more for the validator to detect that this element needs upgrading.
(Copied over from duplicate ticket #6146)
The presets.json in the distribution is quickly getting too large. Just this year, it doubled in size compared to beginning of 2021.
I noticed some "out of memory" issues now just parsing this file on old Android devices (StreetComplete uses the presets.json from name-suggestion-index). At the rate this file is growing, it may soon become not worthwile to include it at all.
I can think of the following solutions:
On my suggestions for solutions, it seems point 1 can be safely disregarded at this point. As Bryan mentioned, zipped, it's a lot smaller and in the end, this data will reside in memory because of the various indexes that make searching through it fast.
On point 2, I think I remember once seeing some kind of automatically generated list of names per country of candidates for being included in the index. E.g. if there are 100 places with the same name and same main tag, it is likely this is a brand. A similar thing could be done to find already added presets that in reality only exist in 1 to few countries but haven't been marked so. This may greatly increase the number of presets that could be separated from the main presets.json.
I created a small script to pull the presets.json apart (only if only include
rules were in the locationSet, all others remain in base file).
File | Size |
---|---|
presets-us.json | 996 KB |
presets-de.json | 591 KB |
presets.json | 570 KB |
presets-jp.json | 416 KB |
presets-gb.json | 308 KB |
presets-fx.json | 275 KB |
presets-fr.json | 260 KB |
presets-ca.json | 256 KB |
presets-cn.json | 207 KB |
presets-es.json | 182 KB |
presets-nl.json | 182 KB |
presets-ru.json | 161 KB |
presets-br.json | 157 KB |
presets-it.json | 152 KB |
presets-tw.json | 149 KB |
presets-at.json | 145 KB |
presets-ch.json | 136 KB |
presets-au.json | 136 KB |
presets-pl.json | 134 KB |
presets-be.json | 114 KB |
presets-in.json | 97 KB |
presets-no.json | 90 KB |
presets-us-or.geojson.json | 88 KB |
presets-se.json | 78 KB |
presets-cz.json | 77 KB |
presets-ua.json | 73 KB |
presets-sk.json | 72 KB |
presets-ie.json | 71 KB |
presets-nz.json | 70 KB |
presets-us-ca.geojson.json | 66 KB |
presets-by.json | 62 KB |
presets-mx.json | 62 KB |
presets-lu.json | 60 KB |
presets-pt.json | 58 KB |
presets-ph.json | 57 KB |
presets-my.json | 57 KB |
presets-us-tx.geojson.json | 57 KB |
presets-sa.json | 56 KB |
presets-fi.json | 51 KB |
presets-tr.json | 49 KB |
presets-ar.json | 49 KB |
presets-th.json | 48 KB |
presets-dk.json | 48 KB |
presets-ae.json | 47 KB |
presets-id.json | 46 KB |
presets-cl.json | 45 KB |
presets-sg.json | 44 KB |
presets-us-wa.geojson.json | 44 KB |
presets-bg.json | 43 KB |
presets-kr.json | 43 KB |
presets-hu.json | 42 KB |
presets-ro.json | 40 KB |
presets-gb-eng.json | 39 KB |
presets-hk.json | 38 KB |
presets-us-il.geojson.json | 36 KB |
presets-il.json | 36 KB |
presets-co.json | 35 KB |
presets-ca-bc.geojson.json | 34 KB |
presets-pe.json | 34 KB |
presets-ca-on.geojson.json | 32 KB |
presets-gr.json | 30 KB |
presets-ir.json | 29 KB |
presets-us-ny.geojson.json | 27 KB |
presets-us-fl.geojson.json | 27 KB |
presets-ca-qc.geojson.json | 27 KB |
presets-de-nw.geojson.json | 26 KB |
presets-za.json | 25 KB |
presets-us-va.geojson.json | 25 KB |
presets-hr.json | 25 KB |
presets-vn.json | 24 KB |
presets-de-by.geojson.json | 24 KB |
presets-kz.json | 23 KB |
presets-us-oh.geojson.json | 23 KB |
presets-us-pa.geojson.json | 22 KB |
presets-de-bw.geojson.json | 22 KB |
presets-ma.json | 21 KB |
presets-ec.json | 20 KB |
presets-us-mi.geojson.json | 20 KB |
presets-bh.json | 19 KB |
presets-bo.json | 19 KB |
presets-us-az.geojson.json | 18 KB |
presets-kw.json | 18 KB |
presets-rs.json | 18 KB |
presets-pk.json | 18 KB |
presets-tn.json | 17 KB |
presets-qa.json | 17 KB |
presets-lt.json | 16 KB |
presets-us-md.geojson.json | 16 KB |
presets-lv.json | 16 KB |
presets-gb-lon.geojson.json | 16 KB |
presets-ee.json | 16 KB |
presets-dz.json | 15 KB |
presets-us-wi.geojson.json | 15 KB |
presets-au-nsw.geojson.json | 15 KB |
presets-us-ct.geojson.json | 15 KB |
presets-gt.json | 15 KB |
presets-pa.json | 15 KB |
presets-ve.json | 14 KB |
presets-ca-ab.geojson.json | 14 KB |
presets-si.json | 14 KB |
presets-us-nj.geojson.json | 14 KB |
presets-us-ga.geojson.json | 14 KB |
presets-bd.json | 14 KB |
presets-cr.json | 13 KB |
presets-gb-sct.json | 13 KB |
presets-gb-nir.json | 13 KB |
presets-us-ma.geojson.json | 13 KB |
presets-us-co.geojson.json | 12 KB |
presets-om.json | 12 KB |
presets-lk.json | 12 KB |
presets-eg.json | 12 KB |
presets-uk.json | 12 KB |
presets-ci.json | 12 KB |
presets-us-mo.geojson.json | 12 KB |
presets-us-in.geojson.json | 12 KB |
presets-de-he.geojson.json | 12 KB |
presets-gb-wls.json | 11 KB |
presets-us-nv.geojson.json | 11 KB |
presets-de-rp.geojson.json | 11 KB |
presets-gh.json | 11 KB |
presets-us-mn.geojson.json | 11 KB |
presets-us-ar.geojson.json | 11 KB |
presets-sv.json | 10 KB |
presets-de-ni.geojson.json | 10 KB |
presets-ng.json | 10 KB |
presets-ba.json | 10 KB |
presets-us-hi.json | 10 KB |
presets-hn.json | 10 KB |
presets-us-ky.geojson.json | 10 KB |
presets-gb-east-midlands.geojson.json | 9 KB |
presets-us-de.geojson.json | 9 KB |
presets-us-ia.geojson.json | 9 KB |
presets-uy.json | 9 KB |
presets-cy.json | 9 KB |
presets-us-al.geojson.json | 9 KB |
presets-de-sn.geojson.json | 9 KB |
presets-pr.json | 8 KB |
presets-us-ok.geojson.json | 8 KB |
presets-ml.json | 8 KB |
presets-mo.json | 8 KB |
presets-new_york_city.geojson.json | 8 KB |
presets-au-vic.geojson.json | 8 KB |
presets-us-nh.geojson.json | 8 KB |
presets-md.json | 8 KB |
presets-bw.json | 8 KB |
presets-gb-som.geojson.json | 8 KB |
presets-na.json | 8 KB |
presets-is.json | 8 KB |
presets-us-wv.geojson.json | 8 KB |
presets-sn.json | 8 KB |
presets-py.json | 8 KB |
presets-de-bb.geojson.json | 8 KB |
presets-us-nc.geojson.json | 8 KB |
presets-ke.json | 8 KB |
presets-mm.json | 8 KB |
presets-ao.json | 7 KB |
presets-baltimore_and_dc.geojson.json | 7 KB |
presets-gb-south-west.geojson.json | 7 KB |
presets-150.json | 7 KB |
presets-de-sh.geojson.json | 7 KB |
presets-gb-east-england.geojson.json | 7 KB |
presets-gg.json | 7 KB |
presets-gb-south-east-coast.geojson.json | 7 KB |
presets-tz.json | 7 KB |
presets-zm.json | 7 KB |
presets-us-tn.geojson.json | 7 KB |
presets-do.json | 7 KB |
presets-jo.json | 7 KB |
presets-de-be.geojson.json | 7 KB |
presets-je.json | 7 KB |
presets-us-sc.geojson.json | 7 KB |
presets-us-ne.geojson.json | 7 KB |
presets-us-dc.geojson.json | 6 KB |
presets-us-id.geojson.json | 6 KB |
presets-cu.json | 6 KB |
presets-bj.json | 6 KB |
presets-de-mv.geojson.json | 6 KB |
presets-cd.json | 6 KB |
presets-ug.json | 6 KB |
presets-ca-nb.geojson.json | 6 KB |
presets-me.json | 6 KB |
presets-bf.json | 6 KB |
presets-au-tas.json | 6 KB |
presets-mz.json | 6 KB |
presets-ca-sk.geojson.json | 6 KB |
presets-us-ks.geojson.json | 6 KB |
presets-mk.json | 6 KB |
presets-us-nm.geojson.json | 5 KB |
presets-de-hh.geojson.json | 5 KB |
presets-cm.json | 5 KB |
presets-au-qld.geojson.json | 5 KB |
presets-tg.json | 5 KB |
presets-et.json | 5 KB |
presets-al.json | 5 KB |
presets-de-st.geojson.json | 5 KB |
presets-ca-mb.geojson.json | 5 KB |
presets-gb-west-midlands.geojson.json | 5 KB |
presets-kh.json | 5 KB |
presets-rw.json | 5 KB |
presets-ni.json | 5 KB |
presets-mt.json | 5 KB |
presets-am.json | 5 KB |
presets-ad.json | 5 KB |
presets-us-la.geojson.json | 5 KB |
presets-bn.json | 5 KB |
presets-li.json | 5 KB |
presets-us-ms.geojson.json | 5 KB |
presets-de-hb.geojson.json | 5 KB |
presets-gb-yorkshire.geojson.json | 5 KB |
presets-gb-north-west.geojson.json | 5 KB |
presets-ca-ns.geojson.json | 4 KB |
presets-us-ak.json | 4 KB |
presets-ye.json | 4 KB |
presets-mn.json | 4 KB |
presets-ge.json | 4 KB |
presets-tt.json | 4 KB |
presets-kg.json | 4 KB |
presets-gb-dor.geojson.json | 4 KB |
presets-nz-can.geojson.json | 4 KB |
presets-sl.json | 4 KB |
presets-lb.json | 4 KB |
presets-uz.json | 4 KB |
presets-im.json | 4 KB |
presets-gb-south-central.geojson.json | 4 KB |
presets-cg.json | 4 KB |
presets-gb-north-east.geojson.json | 4 KB |
presets-mu.json | 4 KB |
presets-de-th.geojson.json | 4 KB |
presets-us-nd.geojson.json | 4 KB |
presets-lr.json | 4 KB |
presets-ga.json | 4 KB |
presets-us-me.geojson.json | 4 KB |
presets-re.json | 3 KB |
presets-us-ri.geojson.json | 3 KB |
presets-us-ut.geojson.json | 3 KB |
presets-au-sa.geojson.json | 3 KB |
presets-ca-nt.geojson.json | 3 KB |
presets-ca-yt.geojson.json | 3 KB |
presets-np.json | 3 KB |
presets-gm.json | 3 KB |
presets-us-wy.geojson.json | 3 KB |
presets-eu.json | 3 KB |
presets-au-wa.geojson.json | 3 KB |
presets-us-mt.geojson.json | 3 KB |
presets-sz.json | 3 KB |
presets-gb-con.geojson.json | 3 KB |
presets-mv.json | 3 KB |
presets-bb.json | 3 KB |
presets-ne.json | 3 KB |
presets-gb-greater-manchester.geojson.json | 3 KB |
presets-gb-iow.geojson.json | 3 KB |
presets-iq.json | 3 KB |
presets-us-sd.geojson.json | 3 KB |
presets-mc.json | 3 KB |
presets-bm.json | 3 KB |
presets-td.json | 3 KB |
presets-ls.json | 3 KB |
presets-sd.json | 2 KB |
presets-az.json | 2 KB |
presets-gn.json | 2 KB |
presets-mg.json | 2 KB |
presets-peoples_united_bank_ct.geojson.json | 2 KB |
presets-ss.json | 2 KB |
presets-us-vt.geojson.json | 2 KB |
presets-nc.json | 2 KB |
presets-ly.json | 2 KB |
presets-pf.json | 2 KB |
presets-af.json | 2 KB |
presets-zw.json | 2 KB |
presets-mr.json | 2 KB |
presets-dj.json | 2 KB |
presets-gb-dev.geojson.json | 2 KB |
presets-bi.json | 2 KB |
presets-la.json | 2 KB |
presets-gi.json | 2 KB |
presets-aw.json | 2 KB |
presets-gy.json | 2 KB |
presets-mq.json | 2 KB |
presets-gp.json | 2 KB |
presets-southern_nevada.geojson.json | 2 KB |
presets-khm.json | 2 KB |
presets-washoe_county.geojson.json | 2 KB |
presets-sx.json | 2 KB |
presets-greater_dayton_regional_transit_authority.geojson.json | 2 KB |
presets-san_luis_obispo_county.geojson.json | 2 KB |
presets-cuyahoga_county.geojson.json | 2 KB |
presets-lausd_los_angeles.geojson.json | 2 KB |
presets-mp.json | 2 KB |
presets-ie-d.geojson.json | 2 KB |
presets-cat_hood_river.geojson.json | 2 KB |
presets-bz.json | 2 KB |
presets-crimea.json | 2 KB |
presets-de-sl.geojson.json | 2 KB |
presets-us-ak.geojson.json | 1 KB |
presets-mw.json | 1 KB |
presets-first_state_bank_ne_west.geojson.json | 1 KB |
presets-bs.json | 1 KB |
presets-first_bank_carolinas.geojson.json | 1 KB |
presets-sy.json | 1 KB |
presets-first_bank_western_us.geojson.json | 1 KB |
presets-metro_rta.geojson.json | 1 KB |
presets-gu.json | 1 KB |
presets-nz-ota.geojson.json | 1 KB |
presets-830.json | 1 KB |
presets-au-nt.geojson.json | 1 KB |
presets-nz-tas.geojson.json | 1 KB |
presets-tj.json | 1 KB |
presets-tucson.geojson.json | 1 KB |
presets-gd.json | 1 KB |
presets-nz-wgn.geojson.json | 1 KB |
presets-bq.json | 1 KB |
presets-cw.json | 1 KB |
presets-de-bw.json | 1 KB |
presets-kn.json | 1 KB |
presets-sc.json | 1 KB |
presets-nz-auk.geojson.json | 1 KB |
presets-first_state_bank_ne_east.geojson.json | 1 KB |
presets-first_state_bank_il.geojson.json | 1 KB |
presets-first_state_bank_mi.geojson.json | 1 KB |
presets-first_state_bank_tx.geojson.json | 1 KB |
presets-xk.json | 1 KB |
presets-first_state_bank_oh.geojson.json | 1 KB |
presets-us-ca-sanfrancisco.geojson.json | 1 KB |
presets-us-ca-sanjose.geojson.json | 1 KB |
presets-ps.json | 1 KB |
presets-us-ca-eastbay.geojson.json | 1 KB |
presets-florida_keys.geojson.json | 1 KB |
presets-gb-mik.geojson.json | 1 KB |
presets-151.json | 1 KB |
presets-london-cycles.geojson.json | 1 KB |
presets-ms.json | 1 KB |
presets-miltonkeynes-cycles.geojson.json | 1 KB |
presets-tk.json | 1 KB |
presets-id-jw.json | 1 KB |
presets-pi.json | 1 KB |
presets-stadtmobil-rhein-neckar.geojson.json | 1 KB |
presets-jm.json | 1 KB |
presets-stadtmobil-stuttgart.geojson.json | 1 KB |
presets-stadtmobil-karlsruhe.geojson.json | 1 KB |
presets-pg.json | 1 KB |
presets-stadtmobil-suedbaden.geojson.json | 1 KB |
presets-stadtmobil-rhein-main.geojson.json | 1 KB |
presets-stadtmobil-rhein-ruhr.geojson.json | 1 KB |
presets-fj.json | 1 KB |
presets-ag.json | 1 KB |
presets-lc.json | 1 KB |
presets-stadtmobil-hannover.geojson.json | 1 KB |
presets-ca-nl.geojson.json | 1 KB |
presets-ca-nu.geojson.json | 1 KB |
presets-ca-pe.geojson.json | 1 KB |
presets-stadtmobil-berlin.geojson.json | 1 KB |
presets-er.json | 1 KB |
presets-gf.json | 1 KB |
presets-mi.json | 1 KB |
presets-stadtmobil-trier.geojson.json | 1 KB |
presets-q3336843.json | 1 KB |
presets-gb-devon-cornwall.geojson.json | 1 KB |
presets-cv.json | 1 KB |
presets-gw.json | 1 KB |
presets-gb-abd.geojson.json | 1 KB |
presets-ra.json | 1 KB |
presets-gb-bir.geojson.json | 1 KB |
presets-gb-abe.geojson.json | 1 KB |
presets-vi.json | 1 KB |
presets-deu.json | 1 KB |
presets-fra.json | 1 KB |
presets-idn.json | 1 KB |
presets-aut.json | 1 KB |
presets-esp.json | 1 KB |
presets-konsum-leipzig.geojson.json | 1 KB |
presets-tl.json | 1 KB |
presets-konsum-dresden.geojson.json | 1 KB |
presets-ky.json | 1 KB |
presets-ja.json | 1 KB |
presets-ai.json | 1 KB |
presets-foodland_eastern_us.geojson.json | 1 KB |
presets-039.json | 1 KB |
presets-155.json | 1 KB |
presets-ic.json | 1 KB |
presets-029.json | 1 KB |
presets-ms,.json | 1 KB |
presets-so.json | 1 KB |
presets-fo.json | 1 KB |
presets-sm.json | 1 KB |
presets-gq.json | 1 KB |
In total, 9.30 MB. If everything was in the base file, it's 7.2 MB. So, there is some repetition, but as stated earlier, a distribution of these files can be packed, so the real difference is smaller.
And most important of all, if you are in France for example, the presets to load are only 830 KB, as opposed to 7.2 MB. So, this is a huge difference. Even for the US and Germany for which the most presets exist, the difference is still huge.
Edit: The script is here. I further tweaked it to throw away all presets the osmfeature library does not support anyway (those with locationSet = ...geojson etc): https://github.com/streetcomplete/StreetComplete/blob/split-brand-presets/buildSrc/src/main/java/UpdateNsiPresetsTask.kt
I now implemented
Loading all the presets (normal presets + localization + international brand presets + presets of the country one is in) during startup now takes 0.6s compared to 2.7s before on my phone. So roughly 4 times faster now.
Loading all the presets (normal presets + localization + international brand presets + presets of the country one is in) during startup now takes 0.6s compared to 2.7s before on my phone. So roughly 4 times faster now.
Is there anything for us to do on the NSI side?
I do think going forward we should trim down dist/nsi.json
to only include the items that actually have wikidata tags.
Currently it includes everything, which isn't really how I intended it to be.
From my end, no, because I already wrote the script to take apart the nsi.json myself. That script contains additional stuff - as outlined in my last message - that will probably not be done if the files were already separate this way in the dist
folder.
Currently
nsi-id-presets.min.json
is about 8MB in size which is a lot. A possible way to solve this problem is to split it per country. There will be duplicated information across files, but it will load much faster for all users. A side effect would be that it will reduce loading times of the editors and speed up the validation speed of brands. Such change will require changes in the build process and changes to how iD and RapiD handle brands (which is why I'm tagging @tyrasd and @bhousel). Are you in support of such change and would it be too hard to implement? Also, are there any other data consumers that will be affected? If there are, it would be nice to leave both "systems" to coexist and add a folder indist
where the new per country method is used.