Closed shaansubbaiah closed 4 years ago
Not able to reproduce with Ubuntu 20.04 sucrose version 0.117.
HTTP Error 403 is a response from the web server. So something about the request was unacceptable. My guess is that there is no referrer header. A web site can check for referrer to make sure the request comes from a browser loading a specific page.
It is better for an image to be copied out of what was already requested and downloaded. Making a subsequent request is bad because (a) it signals to anyone listening that it is happening, (b) the image may have changed, (c) extra time and network resources are used.
@Saumya-Mishra9129 thanks for testing.
@quozl I did not get what you meant. I agree that copying a downloaded image is better but errors should not be thrown while trying to copy it directly.
It appears that we don't haven't implemented a method to handle HTTPErrors yet?
@quozl I did not get what you meant. I agree that copying a downloaded image is better but errors should not be thrown while trying to copy it directly.
Error is thrown by the web server, and we don't control the web server. I'm speculating that the web server is coded to do this when a referrer header is missing, or if a referrer header is re-used. Policy of Google Image Search. Can you confirm?
All you have to do is replicate the download using a tool such as wget, curl, or python requests. Google applies different anti-abuse and anti-theft strategies in different countries, according to local laws, regulations and politics.
If we can prove the HTTP Error 403 is expected, the next thing is to either
@quozl Initially I tried various links with curl (with and without the referer
and user-agent
headers) and was able to download the image properly. I guess this was because they were static links from google/the linked site.
Finally, after trying with a google query (which I should have tried earlier) I was able to reproduce the issue on my system. The request fails with status 403
when I'm missing either the user-agent
or the referer
headers.
Search term was 'gradient'
// successful query (with both 'referer' and 'user-agent')
curl "https://www.google.com/search?q=gradient&espv=2&biw=1366&bih=667&site=webhp&source=lnms&tbm=isch&sa=X&ei=XosDVaCXD8TasATItgE&ved=0CAcQ_AUoAg" --user-agent "Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0" --referer "https://www.google.com/"
Thanks. Good to know. That means the 403 is expected when Referer is absent. I refer you now to my earlier reply. :grin:
@Saumya-Mishra9129 I can't reproduce this issue with all images, one image with which I can reproduce this consistently is:
search term: candyfloss right click - copy image
@quozl turns out it is not due to the referer
header but due to the user-agent
. Setting the referer
header did not prevent the error.
#/usr/src/sugar-activities/Browse.activity/palettes.py __copy_image_activate_cb(self, menu_item)
data = urllib.request.urlopen(self.\_image_url).read()
to
req = urllib.request.Request(self.\_image_url)
# req.add_header('Referer', 'http://www.google.com')
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0')
data = urllib.request.urlopen(req).read()
The default user-agent
header used by urllib
is urllib/VVV
, where VVV is its version number
Perhaps a check was implemented by the website to combat bots?
Thanks. That's now, but the future could be different. Google will keep changing their code. Our code should not use urllib
as a user-agent, but should use the user-agent of the WebKit2 browser. Perhaps it should use WebKit2 to fetch the image rather than Python urllib.
After testing today, I cannot reproduce this error exactly on the same image on Sugar Live Build v0.117. Instead of the traceback reported initially in the issue, I now get Terminated by signal 11, pid 1784 activity_id 3e4a49390d0299ef2f366d5e57c79bb29a133a48
.
Traceback is the same as reported earlier for Debian Bullseye.
Not sure what changed, tested Browse from master branch on both distributions.
Signal 11 is SIGSEGV, segmentation fault, a read or write to memory out of bounds or protected. Usual next step is to use gdb to obtain a backtrace at C language level. Things that might have changed are journal entries, and the activity instance directory. You should also control for these possibilities.
@quozl how do I use gdb
with sugar-activity3
?
I tried: gdb sugar-activity3
(similar to the example usage of pdb
in the wiki ) in the Activity directory and it resulted in errors.
"/usr/bin/sugar-activity3": not in executable format: file format not recognized
Also, the activity is written in Python, and you have mentioned to obtain a backtrace at C language level. Is it maybe to check the backtrace of C libraries that were abstracted to Python?
how do I use gdb with sugar-activity3?
@shaansubbaiah You can try these commands to run an activity with gdb. Thanks. I use them on Ubuntu. I am not sure about debian.
gdb python3
run //usr/bin/sugar-activity3
@Saumya-Mishra9129 is correct, although the //
is more usually/
, but any number of /
can be used. Then when the segmentation fault occurs, the (gdb)
prompt will appear, and can be used to examine CPU state.
Also, the activity is written in Python, and you have mentioned to obtain a backtrace at C language level. Is it maybe to check the backtrace of C libraries that were abstracted to Python?
Not abstracted, but linked. Python itself is written in C, is linked dynamically to many libraries written in C and sometimes other languages. Because the activity imports WebKit via PyGObject introspection, even more libraries are opened and mapped into virtual memory. Each of these open and map still more.
You can get an idea of the complexity by using Linux kernel APIs.
ps ax
, and assign it to a shell variable PID,/proc/$PID/smaps
using less
or grep
,/proc/$PID/map_files/
.On Ubuntu 20.04 with Browse, there are about 157 libraries opened.
# readlink /proc/2503/map_files/* | sort | uniq | grep '\.so' | wc --lines
157
Looking just at libraries resolved by PyGObject introspection;
# readlink /proc/2503/map_files/*|grep typelib|cut -f6 -d'/'
SugarGestures-1.0.typelib
SugarExt-1.0.typelib
Rsvg-2.0.typelib
TelepathyGLib-0.12.typelib
Soup-2.4.typelib
WebKit2-4.0.typelib
SoupGNOME-2.4.typelib
JavaScriptCore-4.0.typelib
Atk-1.0.typelib
Gio-2.0.typelib
Pango-1.0.typelib
Gdk-3.0.typelib
Gtk-3.0.typelib
GLib-2.0.typelib
GdkX11-3.0.typelib
GObject-2.0.typelib
GdkPixbuf-2.0.typelib
cairo-1.0.typelib
GModule-2.0.typelib
xlib-2.0.typelib
When you get a SIGSEGV, you have to find out where it happened in that huge collection of executable code.
Then you have to read the source code of where it happened, examine the backtrace to find out the circumstances, and then determine the best place for a fix. Remember, professional programmers will take up a new language in a couple of days, not through any organised learning or plan to self improvement, but just because some code has failed and they have to fix it.
@shaansubbaiah You can try these commands to run an activity with gdb. Thanks. I use them on Ubuntu. I am not sure about debian.
gdb python3 run //usr/bin/sugar-activity3
Thank you, that worked. Used run sugar-activity3
after the first line.
Traceback generated:
[Thread 0x7fff96ffd700 (LWP 871) exited]
failed to create drawable
Traceback (most recent call last):
File "/usr/share/sugar/activities/Browse.activity/palettes.py", line 248, in __copy_image_activate_cb
data = urllib.request.urlopen(self._image_url).read()
File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
[Thread 0x7fff967fc700 (LWP 872) exited]
Thread 1 "B" received signal SIGSEGV, Segmentation fault.
0x00007fffe1ad259d in ?? () from /lib/x86_64-linux-gnu/libwebkit2gtk-4.0.so.37
The active thread is "B" and it failed within WebKit. Three counterfactuals;
In common, get detailed traceback. You'll need to install debug symbols for WebKit and any other libraries involved. If you've not done that before, Google Search it.
After testing today, I cannot reproduce this error exactly on the same image on Sugar Live Build v0.117. Instead of the traceback reported initially in the issue, I now get
Terminated by signal 11, pid 1784 activity_id 3e4a49390d0299ef2f366d5e57c79bb29a133a48
. Not sure what changed, tested Browse from master branch on both distributions.
Any of those 157 libraries. :grinning: Sometimes I use /var/log/dpkg.log
to find out what might have changed during automatic security updates. If no updates were applied, then perhaps the fault is time dependent.
@quozl just saw your comments, I'll go through them and report back.
Also, I have found that downloads in Browse are handled using downloadmanager.py
which uses the WebKit Download class
, whereas copying images which are handled using palettes.py
uses urllib
. Is there any reason for this?
gdb appears extremely useful, I'll have to take a deeper look. I haven't explored debugging tools fully and use some form of print
most of the time.
@Saumya-Mishra9129 is correct, although the // is more usually/, but any number of / can be used. Then when the segmentation fault occurs, the (gdb) prompt will appear, and can be used to examine CPU state.
@quozl Can you tell how to install gdb in Debian 10 buster?. I couldnt find out a way.
apt install gdb
should work, think that's how I had installed it on Sugar Live Build.
@Saumya-Mishra9129 I can't reproduce this issue with all images, one image with which I can reproduce this consistently is:
search term: candyfloss right click - copy image
I tested with Debian 10 Buster, I am able to reproduce the error.
apt install gdb should work, think that's how I had installed it on Sugar Live Build.
I don't know but It doesn't work for me. It shows-
Package gdb is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source.
E: Package 'gdb' has no installation candidate.
Did you run apt update
before that? Also make sure that /etc/apt/sources.list
is not empty/ has issues for some reason.
My sources.list on the Sugar Live Build (Buster),
#/etc/apt/sources.list
deb http://deb.debian.org/debian/ buster main
deb-src http://deb.debian.org/debian/ buster main
deb http://security.debian.org/debian-security buster/updates main
deb-src http://security.debian.org/debian-security buster/updates main
# buster-updates, previously known as 'volatile'
deb http://deb.debian.org/debian/ buster-updates main
deb-src http://deb.debian.org/debian/ buster-updates main
Also make sure that /etc/apt/sources.list is not empty/ has issues for some reason.
My sources.list on the Sugar Live Build (Buster),
Thanks It helped. I just found out that my sources.list had some issues. It is solved.apt install gdb
works now.
Just noticed that the User-Agent String is Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0 Safari/605.1.15 SugarLabs/0.116
_get_sugar_version()
in browser.py
which is used to generate it returns only even-numbered versions, and all the releases listed on the wiki are even-numbered. A comment in the function also indicates that they are the latest stable versions.
Is v0.117 not considered stable? I haven't found any info on the wiki marking even-numbered versions as stable.
Is v0.117 not considered stable? I haven't found any info on the wiki marking even-numbered versions as stable.
stable and unstable are software manufacturing and release engineering terms. stable means no intent to change; everything important has been fixed. unstable means intent to change; there are important things that are yet to be fixed.
Once upon a time, even numbered releases were stable. I stopped doing that. I also stopped updating the Wiki because nobody was updating it along with me.
Whether 0.118 is stable or unstable depends on whether the release is made with or without everything important being fixed. Sometimes people push me to make a release, citing downstream schedules and their desire not to manage patches. Where possible, I accommodate them, with the result you see above; unstable. I don't respond to demands to work harder; my preference is to work better. The Port to Python 3 is not yet finished for Sugar and activities; that's important to fix.
Also, I have found that downloads in Browse are handled using
downloadmanager.py
which uses theWebKit Download class
, whereas copying images which are handled usingpalettes.py
usesurllib
. Is there any reason for this?
I've no idea. Check the commit history with git blame? Most likely the reason is everyone being slack.
_get_sugar_version()
inbrowser.py
which is used to generate it returns only even-numbered versions, and all the releases listed on the wiki are even-numbered. A comment in the function also indicates that they are the latest stable versions.
Given that we no longer use even-numbering to indicate stable, this code could change.
Sometimes an error is thrown while trying to copy an image.
Reproduce: right click -> copy image
Tested on: Sugar 0.117-3, Debian Bullseye Sugar 0.117, Sugar Live Build
Errors appear while trying on Google Images, but worked without issues on the same image later. Not sure how to reproduce this accurately.