shanecoughlan / data-twist

Experimental script to twist Open Data into new shapes
5 stars 1 forks source link

Global Map view fails when there is a very large dataset #11

Open shanecoughlan opened 11 years ago

shanecoughlan commented 11 years ago

When you have a few hundred posts using geo-data and have a global map, you can go to the Settings > Geo Mashup > Overall tab, and under "Include Taxonomies" you can select to support "Categories"

Your global map will display fine, as per this site: http://www.opendawn.com/matsue/

However, if you do the same with a site that contains a few thousand entries, the global map will fail to load. Our test site for this was here: http://www.opendawn.com/geo1/overview-page/

It contained just over 9,000 shop locations in Tokyo.

shanecoughlan commented 11 years ago

This was originally entitled "Global Map view fails when "Include Taxonomies" is used in Geo Mashup and there is a large dataset" and that was accurate up to around 10,000 geo-locations.

However, after importing more geo data and pushing up the entries to around 14,000 in a London sample, the global map fails to display regardless of whether "Include Taxonomies" is used in Geo Mashup. This is probably an issue of scale and database queries/timeouts. We might take it up with Dylan of Geo Mashup and see if there is a way to work around this, perhaps by having "not quite so global" maps used to display certain data.

cyberhobo commented 11 years ago

http://www.opendawn.com/geo1/overview-page/ is loading okay for me. I wonder if the failure is related to browser memory?

On the server, including taxonomies will run a query for every object. I'm open to ideas for remedying that.

Clustering also needs some love. Client-side would be great to do at the Mapstraction level, but practically I think I'll need an existing library, which likely means provider-specific clustering. Server-side clustering would be great: https://code.google.com/p/wordpress-geo-mashup/issues/detail?id=386

shanecoughlan commented 11 years ago

My apologies. I reduced the entries on Geo 1 to 9,073 to play with the "Include Taxonomies" problem. When you include categories, the Overview page fails to load. When you turn them off, the overview page loads.

There is another test site now active called (with astonishing originality) Geo 2. This has the 14,686 London locations. While the geo-directory appears OK, the overview page fails to load regardless of what we do with the taxonomies. Image attached.

http://www.opendawn.com/geo1/ http://www.opendawn.com/geo2/

I made you an admin in both install in case you want to root around the Wordpress background. I sent your username and password by mail. brokenoverview

I'm going to have a look at your clustering link.

cyberhobo commented 11 years ago

The server was running out of memory as a result of caching map data in a transient. I commented out the line of code that does that, and it loads. The cost is that the query runs twice - avoiding that will probably require major internal changes unfortunately.

shanecoughlan commented 11 years ago

Hi Dylan

That's an awesome result!

The cost of two queries is something that most people will probably want to avoid, but people making big geo-sets maybe be happy to take.

Do you think it might be feasible to add a checkbox option somewhere in the advanced settings for this? The ballpark idea would be that until we have enough development traction to help re-code with you around the issue, it would be really useful for big data users to be able to get overviews on pretty much all types of server, including limited VPS like the opendawn.com one.

We are going to go "live" with Matsue city in Japan starting on the 15th of this month, and I hope to have more programmers around in the months after that, but it will take a while to grow the project.

Shane

On Mar 4, 2013, at 12:43 , Dylan Kuhn notifications@github.com wrote:

The server was running out of memory as a result of caching map data in a transient. I commented out the line of code that does that, and it loads. The cost is that the query runs twice - avoiding that will probably require major internal changes unfortunately.

— Reply to this email directly or view it on GitHub.

shanecoughlan commented 11 years ago

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

cyberhobo commented 11 years ago

I went ahead and did that in the data-twist branch:

https://github.com/cyberhobo/wp-geo-mashup/commit/b940ca8d28dfdd706e00647746948e3c6be6d557

On Fri, Mar 8, 2013 at 3:56 AM, shanecoughlan notifications@github.comwrote:

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

— Reply to this email directly or view it on GitHubhttps://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14615974 .

shanecoughlan commented 11 years ago

Very cool. I compiled it and uploaded it to a test site.

I just ran it on China (62,081 pins) and the overview page failed. It looks like once we scale up this far overviews might be out of the question.

http://www.opendawn.com/geo4/ was the testing location.

Shane

On Mar 9, 2013, at 02:53 , Dylan Kuhn notifications@github.com wrote:

I went ahead and did that in the data-twist branch:

https://github.com/cyberhobo/wp-geo-mashup/commit/b940ca8d28dfdd706e00647746948e3c6be6d557

On Fri, Mar 8, 2013 at 3:56 AM, shanecoughlan notifications@github.comwrote:

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

— Reply to this email directly or view it on GitHubhttps://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14615974 .

— Reply to this email directly or view it on GitHub.

cyberhobo commented 11 years ago

I can't find any error logs for the geo* installs. It would be interesting to know where it runs out of memory.

On Mon, Mar 11, 2013 at 4:41 AM, shanecoughlan notifications@github.comwrote:

Very cool. I compiled it and uploaded it to a test site.

I just ran it on China (62,081 pins) and the overview page failed. It looks like once we scale up this far overviews might be out of the question.

http://www.opendawn.com/geo4/ was the testing location.

Shane

On Mar 9, 2013, at 02:53 , Dylan Kuhn notifications@github.com wrote:

I went ahead and did that in the data-twist branch:

https://github.com/cyberhobo/wp-geo-mashup/commit/b940ca8d28dfdd706e00647746948e3c6be6d557

On Fri, Mar 8, 2013 at 3:56 AM, shanecoughlan notifications@github.comwrote:

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

— Reply to this email directly or view it on GitHub< https://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14615974>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14708716 .

shanecoughlan commented 11 years ago

Thanks Dylan. Good to know where the error does not lie. I'll try to hunt down the location. At a guess, if we have two queries per post, the 100k+ calls to the database in this case (62,000 posts) may have hit a restriction.

Kana and I will be formally launching Data Twist in Japan on the 15th, so we hope to get a few more hands on board to assist with these things after that.

Shane

On Mar 12, 2013, at 24:01 , Dylan Kuhn notifications@github.com wrote:

I can't find any error logs for the geo* installs. It would be interesting to know where it runs out of memory.

On Mon, Mar 11, 2013 at 4:41 AM, shanecoughlan notifications@github.comwrote:

Very cool. I compiled it and uploaded it to a test site.

I just ran it on China (62,081 pins) and the overview page failed. It looks like once we scale up this far overviews might be out of the question.

http://www.opendawn.com/geo4/ was the testing location.

Shane

On Mar 9, 2013, at 02:53 , Dylan Kuhn notifications@github.com wrote:

I went ahead and did that in the data-twist branch:

https://github.com/cyberhobo/wp-geo-mashup/commit/b940ca8d28dfdd706e00647746948e3c6be6d557

On Fri, Mar 8, 2013 at 3:56 AM, shanecoughlan notifications@github.comwrote:

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

— Reply to this email directly or view it on GitHub< https://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14615974>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14708716 .

— Reply to this email directly or view it on GitHub.

cyberhobo commented 11 years ago

I don't think you get two queries per post, I think you get the same big query for all posts running twice. Shouldn't too hard to verify with the debug bar if you want.

Hope the launch goes well!

On Wed, Mar 13, 2013 at 12:57 AM, shanecoughlan notifications@github.comwrote:

Thanks Dylan. Good to know where the error does not lie. I'll try to hunt down the location. At a guess, if we have two queries per post, the 100k+ calls to the database in this case (62,000 posts) may have hit a restriction.

Kana and I will be formally launching Data Twist in Japan on the 15th, so we hope to get a few more hands on board to assist with these things after that.

Shane

On Mar 12, 2013, at 24:01 , Dylan Kuhn notifications@github.com wrote:

I can't find any error logs for the geo* installs. It would be interesting to know where it runs out of memory.

On Mon, Mar 11, 2013 at 4:41 AM, shanecoughlan notifications@github.comwrote:

Very cool. I compiled it and uploaded it to a test site.

I just ran it on China (62,081 pins) and the overview page failed. It looks like once we scale up this far overviews might be out of the question.

http://www.opendawn.com/geo4/ was the testing location.

Shane

On Mar 9, 2013, at 02:53 , Dylan Kuhn notifications@github.com wrote:

I went ahead and did that in the data-twist branch:

https://github.com/cyberhobo/wp-geo-mashup/commit/b940ca8d28dfdd706e00647746948e3c6be6d557

On Fri, Mar 8, 2013 at 3:56 AM, shanecoughlan < notifications@github.com>wrote:

Hi Dylan, could you let me know which line of code needs to be edited so that I can also test it on other data-sets?

Shane

(I put 62,000 locations from China on www.opendawn.com/geo4)

— Reply to this email directly or view it on GitHub<

https://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14615974>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub< https://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14708716>

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/shanecoughlan/data-twist/issues/11#issuecomment-14828682 .