vanthieughita / gmapcatcher

Automatically exported from code.google.com/p/gmapcatcher
0 stars 0 forks source link

Mass download from OpenCycleMap is a breach of terms of use #145

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Mass download from open cycle map should be disabled as such use is against
the website terms of use.

Please also note that the User Agent used does not describe the application
(being a cut'n'paste from example Python code) -- this is also against the
terms.

Original issue reported on code.google.com by dstu...@gmail.com on 29 Mar 2010 at 4:23

GoogleCodeExporter commented 8 years ago
Perhaps a form of attribution such as presenting a link in the status bar would 
serve
the purpose.  Does /open/cyclemap really want to lock down its data or just 
make sure
it is correctly attributed?

(I'm neither affiliated with opencyclemap nor gmapcatcher, just hoping that 
this is
resolved more amicably)

Original comment by mikeudal...@gmail.com on 29 Mar 2010 at 6:32

GoogleCodeExporter commented 8 years ago
Thanks Mike! 

...and Shame to openCycleMap something like that does not make look very "Open"

Original comment by heldersepu on 29 Mar 2010 at 6:58

GoogleCodeExporter commented 8 years ago
This isn't about locking down data or anything like that.

Servers like this are barely standing up as it is without people mass 
downloading 
tiles. The intention of the server is to show tiles to the website -- and it's 
freely 
usable for that, and used on many other sites to boot, but unfortunately mass 
downloaders pretty much destroy the render on demand capability and ordinary 
users' 
experience.

Also the user agent is completely random which is just bad form.

So please, less of the "shame" and a little less of the abuse of a free service 
too 
:-)

BTW while I'm here you should probably read the OSM tile usage policy as well:
http://wiki.openstreetmap.org/wiki/Tile_usage_policy

Thanks.

Original comment by dstu...@gmail.com on 29 Mar 2010 at 10:02

GoogleCodeExporter commented 8 years ago
Just to be clear, it's only the downloader that needs to be disabled -- the 
tile 
viewer part is fine.

Original comment by dstu...@gmail.com on 29 Mar 2010 at 10:10

GoogleCodeExporter commented 8 years ago
Sounds reasonable.  With that option it could still cache tiles viewed by the 
user
but not grab huge sections and burden the server.  I misunderstood your reason 
for
objecting and for that I apologise.

Original comment by mikeudal...@gmail.com on 29 Mar 2010 at 10:20

GoogleCodeExporter commented 8 years ago
If Mike agrees I have to admit this is no longer a fair fight 2 vs 1 but
- If the issue is the servers they (open cycle map) should prevent mass 
downloading at 
the server side, just like Google does, see Issue 16
- If the issue is the servers get more servers or host it some where free
- This is an Open Source Project, the fastest way to get something done is to 
submit a 
patch

I'm a nice guy! see Issue 144

Original comment by heldersepu on 30 Mar 2010 at 12:21

GoogleCodeExporter commented 8 years ago
In my opinion, it is the 'user' who breaks the website terms, not the 
'software'.
GMapCatcher should hint the user about the term, and suggest users not to 
download a
huge section of tiles from specific servers. However, GMapCatcher itself is 
innocent,
and I believe that we needn't to cut its functionality.

This is only my personal view. I'm not a master of law, but I still know the 
lawsuit
between Sony and Bleem!. Can I quote the victories of Bleem! to support 
GMapCatcher? 

Original comment by pi3or...@gmail.com on 30 Mar 2010 at 8:46

GoogleCodeExporter commented 8 years ago
I've add a warning in r804. Call for review.

Original comment by pi3or...@gmail.com on 30 Mar 2010 at 9:55

GoogleCodeExporter commented 8 years ago
...and now is a fair fight 2 vs 2!  Bring it on!!!
Ufff I guess I been watching to much UFC lately

Way to go pi3orama! spoken like a truly professional, I fully support your 
changes in
r804, it is the best solution to this issue.

...and that should do it, if someone still do not like it, just let us know 
when &
where and we will settle it in a bare-knuckle fight!

:)

Original comment by heldersepu on 30 Mar 2010 at 12:11

GoogleCodeExporter commented 8 years ago
Appears to be a fair compromise.  Makes the choice of how to use the software 
down to
the user and, at the least, should prompt people to consider the impact on the
server's resources.

Is there a need to change the user-agent string as the OP mentioned?

Original comment by mikeudal...@gmail.com on 30 Mar 2010 at 1:02

GoogleCodeExporter commented 8 years ago
heldersepu, it's not actually a fight :-)

BTW to give you some sense of scale, cycle map server pumped out 5TB of tiles 
last
month. That's somewhere in the region of 10 million tiles per day. Somehow 
making
your server work out who's abusing your server, and who's just enthusiastically
surfing their cycle route, and who's stuck behind a corporate proxy along with 
20
other people is actually pretty hard, and not a fun thing to be spending your 
spare
time on. Being Google gives you a lot of advantages in terms of bandwidth, 
servers,
and paid programmers spending hours profiling patterns.

What of course doesn't help is when half the kids on the interwebs are trying 
to get
round whatever measures you put in place. Arms races are not a fun place to be.

Original comment by dstu...@gmail.com on 30 Mar 2010 at 2:58

GoogleCodeExporter commented 8 years ago
Stop crying and suck it up! 
If you have such a "huge" demand that got to be good for business!

5 terabytes/month 
= 5242880.00 megabytes/month
=  174762.67 megabytes/day  -> (30 day month)
=    7281.78 megabytes/hour
=     121.36 megabytes/minute
=       2.02 megabytes/second

If your servers can not deal with an average of 2.02 MB/sec then you are living 
in
1999 and MS Win 98 is da' Bomb!

Arms races? maybe I should send you that dirty virus that stiffs the keyboard 
keys
and locks the mouse wheel :)

Original comment by heldersepu on 30 Mar 2010 at 3:30

GoogleCodeExporter commented 8 years ago
please not that my calculations could be incorrect!

according to Google is even less:
5 (TB / month) = 1.99368468 MB / sec

http://www.google.com/search?&q=5+TB%2Fmonth+to+MB%2Fsec

Original comment by heldersepu on 30 Mar 2010 at 3:37

GoogleCodeExporter commented 8 years ago
yeah, because loadings are nice and even.

It peaks about 60Mbit/s.

You also need to account for the fact that we don't have server resources to
prerender 60TB of tiles which mass downloading areas pretty much requires. This 
is
why most of your mass downloaded areas are going to end up 404.

Basically, the server is going to sit there and churn out data to it's hearts
content. Ultimately though the experience of everybody gets harmed by the 
selfishness
of a few.

BTW pi3orama is entirely correct in saying that it's the user breaking the 
terms, not
you guys for making some software. However, I do feel you have a moral 
obligation
here (as you provide open cycle map as a built-in option) to not screw it over.

pi3orama's patch goes a long way towards that.

Original comment by dstu...@gmail.com on 30 Mar 2010 at 4:26

GoogleCodeExporter commented 8 years ago
Let's get serious about this...

I honestly do not believe that GMapCatcher is the culprit of your server 
problems
with "mass downloading" and I'm almost certain that less than 1% of the people 
that
use GMapCatcher have it with openCycleMap. 

The patch from pi3orama is just a warning and end users could still "abuse" the 
servers.

 If you want we could come up with a way of counting the tiles downloaded by users
with GMapCatcher, I could change the URL from:
http://a.andy.sandbox.cloudmade.com/tiles/cycle/1/0/0.png
to something like:
http://a.andy.sandbox.cloudmade.com/tiles/cycle/1/0/0.png?GMapCatcher

Original comment by heldersepu on 30 Mar 2010 at 4:44

GoogleCodeExporter commented 8 years ago
Of course it isn't, I never said it was. You can tell by the way it hasn't been 
blocked.

There's no URL mangling required -- just fix the http user agent you're using.

People can always abuse the servers, I can write a python script to scrape 
tiles in
about 2 minutes. There's about 5 or 6 apps out there that I know of that do 
this.
This ticket is just about there not being 7 -- especially as the catcher part is
actually very useful and no problem.

Original comment by dstu...@gmail.com on 31 Mar 2010 at 9:28

GoogleCodeExporter commented 8 years ago
Good, I'm glad not to be the troublemaker. 
Now, what do you mean by:
"fix the http user agent"

Original comment by heldersepu on 31 Mar 2010 at 11:22

GoogleCodeExporter commented 8 years ago
When you make an http request the library adds a field to the header "User 
Agent".
For example firefox on my machine uses: "Mozilla/5.0 (X11; U; Linux x86_64; 
en-GB;
rv:1.9.1.8) Gecko/20100214 Ubuntu/9.10 (karmic) Firefox/3.5.8"

Your app is reporting itself as "OpenAnything/1.6
+http://diveintopython.org/http_web_services/"

The constructor here:
http://code.google.com/p/gmapcatcher/source/browse/trunk/src/openanything.py#53
suggests you can change it by passing the "agent" parameter.

User agents should be of the form "App/version (extra params)" so a good value 
would
be "GMapCatcher/0.64". As you can see from the firefox one you can add multiple
agents separated by a space.

Original comment by dstu...@gmail.com on 31 Mar 2010 at 11:45

GoogleCodeExporter commented 8 years ago
done deal! I change the user agent to:
USER_AGENT = '%s/%s +%s' % (NAME, VERSION, WEB_ADDRESS)
see r808

Original comment by heldersepu on 31 Mar 2010 at 12:29

GoogleCodeExporter commented 8 years ago

Original comment by heldersepu on 17 Apr 2010 at 2:07