scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.07k stars 512 forks source link

same page, render.html is ok,but execute is error encoding #423

Closed yabaoya closed 8 years ago

yabaoya commented 8 years ago

Test one(execute):

script = ''' 
function main(splash)
    splash:on_request(function(request)
        request:set_proxy{host="111.2.131.233", port=80}
        --request:set_header("Accept-Language", "zh-CN,zh;q=0.8")
        request:set_header("Accept-Encoding", "gzip,deflate")
        --request:set_header("Accept-Encoding", "*")
        --request:set_header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
        --request:set_header("User-Agent", "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11")
        --request:set_header("Connection", "keep-alive")
        --request:set_header("Cache-Control", "max-age=0")
    end)
    --splash:go("http://10.0.8.213:8000/sample/splash/")
    splash:go("http://www.qichacha.com/firm_JS_15d5e3cbf3f484215f9293a9b3489905.shtml")
    splash:wait(3)
    --splash:runjs("document.getElementById('iframe1').innerHTML= frames[0].document.body.innerHTML")
    --splash:runjs("var iis = document.getElementsByTagName('iframe');for(var i = 0;i < iis.length; i++){iis[i].innerHTML=frames[i].document.body.innerHTML}")
    return splash:html()
end
'''
res = requests.get('http://127.0.0.1:8050/execute', params={
    'lua_source': script}, timeout=10)
t = res.text
print t

Result One: �s��k��="" �����|am��f�gu~���v�rn�v7�j����m������Ϊ��k����v���q��l}fj�^j�nu�czҙ�="" �<ңn9�����!jk"��y�<�g�!&ʉ"�ẗ���j�!���g���g������q�f��="" ������9�ەˌp�g�����Ư="" n���.���#gi��!��7z�!�66���_lc����x�iiu�i}��¹��]$j`�x���q�@��[���|1v��7�g�ig�h+͌��p'6���в�o��r0�[��t��|m�u�n��n����u�q="">��5.f\xqaF�;#g��.w�xfx�l�O�oM$.�L|�/y���?�5W��?@�m� Te&���cQ/��ƽ��؛��r�.̮�ݾa�M�X ��m�~-� F/w�Q FB�g�p�xM�k��C[rdp�o��E��~�-z$���=�{��af���D���.�r{�&gt;�� t���F�= w���ُ�� /f �se���8c.|o�&amp;Z�9.1��0x 8Em���:�"�x.�͊�̵b�?��:�{#h|�)Q���굟 @���c��S8QW$��˖w�����6��k����r�R�}; ���n-6֦������!~E;?�����p�w�,��QO�{r �º�v_����?� ��㼉�s;���]�U�?Q���2@r��&amp;O�.�g캯W��x���[w���e �؞;��{C�U�D���@yx� �t�M����Q�&gt;�I�� {���Noh^��'��9��틄���� �~�W�iO�?���{�G�E��JN��� )Qga �U�cC�����E�.k�O�~��V����|��'����=�P e�V&eDWLg��������;��b������u�<~�t��:��OL�=y�����??�?���'R��/J�2�CK_�p�"�F[[=B��h���&gt;�:ڹ�4g8��;IJR���ˊ�!�nw ��:�[P�H�"��\����N������v�^.G.�H���Ñ����[LD�����.���Q�B��6����=.g`�o>U�uV��?�Nsr�ASr��g)&q�:{j�a�.AV �5zF����9Z�6��x��̐��-ڸ|���aG��[�Q@l��B���^}ܘ�6�HI7m�+��W���]m,o!/G]}�^�V�Yߛ�S�7��{����pS���ʲVE���f�a;�oj7���|� i�����e�Fx��0!|<!--?*�p�&�x8����}G�� ����<W��N�;ńBFL���I�f*� @���WDT���u|�)3:&i�F��r���m���+�c�AN,a5���(TJ"�U^D� �X���@c�<����o CS���a�&#�q�=+��szj�T&��4)aX���W�0<,$�x��������/.��b� ����!��RN.H}Ŵ�����q��-->+���]r��Ny.�x���c�Ћx��_��F�Cx����G���{��#���w��g9q�w�k�ⷅz�� ]8����x9s�E%�����4�%���e,#p�#MX�>HIp��K��¯�2.4�z��@��sM��i�3ʅ������!��<���h���&?GO"��砲����仼+��r�p�</0x�nwp></m��\�ᶠ7�q9��������]w �֍></i��}������xl<�k�2pq�>

Test two(render.html):

res = requests.get('http://127.0.0.1:8050/render.html', params={'url': 'http://www.qichacha.com/firm_JS_15d5e3cbf3f484215f9293a9b3489905.shtml',
    'proxy': 'http://111.2.131.233:80'}, headers={'Accept-Encoding': 'gzip,deflate'}, timeout=10)
t = res.text
print t

Result Two: test2

In the test one, when i set request:set_header("Accept-Encoding", "*"),the result is right.

What is the reason?

mantelllo commented 7 years ago

Did you workaround the issue with splash?