openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
658 stars 388 forks source link

Made Near Me creates a huge single HTML file #1668

Open hangy opened 5 years ago

hangy commented 5 years ago

As of right now, https://madenear.me/ contains a single HTML file with embedded JSON. The HTML file is around 17 MiB and even when gzip is used, the download is 2.1 MiB.

Load Time

We should probably reduce initial load time to increase responsiveness: The embedded JSON could be moved to a separate JSON file, which can be loaded asynchronously.

Scripting

Even on a relatively powerful desktop, Chrome reports around 5000ms spent in scripting. After reducing initial load time, we should check if this can be sped up somehow. For example, chunkedLoading or an alternative clustering plugin could be used.

rajo commented 5 years ago

As we are talking about location data here - what about using GeoJSON and storing the source data in a spatially enabled database on the server (temporarily spatialite or postgis) s.t. spatial ajax queries can be made according to the bounding box of the current zoom level?

teolemon commented 5 years ago

a GeoJSON export would be interesting since we could potentially reuse it for a native mobile version. Basically, having a super easy way to map products on mobile, that doesn't incur cost to Open Food Facts, and works with or without Gmaps/Apple maps (possibly using OSM if possible)

rajo commented 5 years ago

Some things I've noticed / came to my mind while playing around with the json data of madenear.me:

As I've learned, MongoDB can store GeoJSON objects and supports some kind of geospatial index. This might lead to the following solution:

At least in theory, this should speed things up. As I don't yet know the architecture of the system or have some dev setup I cannot try and verify my theory.

VaiTon commented 5 years ago
  • Use the getBounds method from leaflet to get the current bounds of the visible area of the map and query the database using geoWithin in order to retrieve all points that would be visible in the current zoom level. This query should return geo and the unique id/barcode of this item.

  • Feed these two values into leaflet again to draw the markers. Register an onclick event for each marker

  • When an user clicks a marker, run a query against the database using the id/ean to retrieve product_name, image etc. and update the markers content.

The only thing I disagree with is to query the DB for every click on the map/region loading. We could create the GeoJSON file inside the DB and update it when we update the products via update_all_productss.pl.

rajo commented 5 years ago

The only thing I disagree with is to query the DB for every click on the map/region loading. We could create the GeoJSON file inside the DB and update it when we update the products via update_all_productss.pl.

I don't understand. The main limitation of the current implementation is the loading of a 20MB JSON-Structure in the DOM which contains 99% of the time data which is not used at all. As a user, I'm only interested in the amount of data which is currently in the focus of my map area. And whenever I zoom or pan, the amount of data which is relevant will change.

Hence the idea is to execute an asynchronous spatial query with the current map extents and to only return the IDs and coordinates of the products relevant for this region. And then again only these values are needed at this moment to paint the markers at the specific coordinate. And only iff a user really starts interacting with a marker, additional information in terms of product description, image and URI are needed. This can be again retrieved asynchronous from the database by using the ID. I assume, that both ID and geocoordinates are properly indexed in the database, s.t. the cost for querying them can be neglected.

I cannot see how creating static files server side which would then again need to contain the entire dataset would be able to speed things up or reduce bandwidth consumption.

I could agree that if the zoom level is lower than a certain threshold (e.g. not entire france, but an area of 20 qm^2 or whatever) one could start to not only return ID and geocoordinate but the entire data in terms of ID, product name, image, uri, geo, ... - assuming that the variety of products in such an area is feasible. But then again I would not try to precompute this data, as a proper spatial index should be able to answer it on the fly.

If you should still want to opt for precomputing the data, I would suggest to split it in geographically ordered tiles like the image tiles of openstreetmap (actually image pyramids). So whenever a user starts panning or zooming, you could transparently add or remove data to/from the DOM in order to reduce memory consumption and data to be transferred. This would need a proper scheme for the individual files though.

But again, I'd opt for the live queries.

hangy commented 5 years ago
  • I wonder where the geo coordinates come from, as I couldn't find them in the data which is produced by any of the available api. So this is some post-processing?

Yes, it's loaded from another data source. https://github.com/openfoodfacts/openfoodfacts-server/blob/8cb6fd279409ebd09e72de5e2504b3e14e0b93dc/lib/ProductOpener/Display.pm#L3180 https://github.com/openfoodfacts/openfoodfacts-server/blob/8cb6fd279409ebd09e72de5e2504b3e14e0b93dc/lib/ProductOpener/Display.pm#L5212

I actually added a script to load the coordinates into MongoDB for https://github.com/openfoodfacts/openfoodfacts-server/issues/731#issuecomment-279239239, because of the issues mentioned in https://github.com/openfoodfacts/openfoodfacts-server/issues/712#issuecomment-279136276.

Hence the idea is to execute an asynchronous spatial query with the current map extents and to only return the IDs and coordinates of the products relevant for this region. And then again only these values are needed at this moment to paint the markers at the specific coordinate. And only iff a user really starts interacting with a marker, additional information in terms of product description, image and URI are needed. This can be again retrieved asynchronous from the database by using the ID. I assume, that both ID and geocoordinates are properly indexed in the database, s.t. the cost for querying them can be neglected.

I agree that creating the HTML to present the content dynamically would be the better option (IMHO). If you the end-users only clicks on 50 products (which would be quite a lot), then the URL product name, description, image srcset, for the other 99% was transferred and probably rendered in DOM for no gain at all. The REST API could easily be used to retrieve that kind of information.

hangy commented 5 years ago

Hence the idea is to execute an asynchronous spatial query with the current map extents and to only return the IDs and coordinates of the products relevant for this region. And then again only these values are needed at this moment to paint the markers at the specific coordinate. And only iff a user really starts interacting with a marker, additional information in terms of product description, image and URI are needed. This can be again retrieved asynchronous from the database by using the ID. I assume, that both ID and geocoordinates are properly indexed in the database, s.t. the cost for querying them can be neglected.

I agree that creating the HTML to present the content dynamically would be the better option (IMHO). If you the end-users only clicks on 50 products (which would be quite a lot), then the URL product name, description, image srcset, for the other 99% was transferred and probably rendered in DOM for no gain at all. The REST API could easily be used to retrieve that kind of information.

Geohash sounds pretty interesting: https://tech.willhaben.at/geo-clustering-3-000-000-points-on-the-fly-a-brief-how-to-9f04d8d5b3a7