medialab / minet

A webmining CLI tool & library for python.
GNU General Public License v3.0
285 stars 26 forks source link

when i try to extract comments from instagram post i get running time error #989

Closed wacns closed 1 month ago

wacns commented 1 month ago

HTML for the request to https://www.instagram.com/p//?a=1&d=dis

Yomguithereal commented 1 month ago

Hello @wacns. I cannot really help you with this information alone. The url you show is not even a valid instagram url. To be able to help you I need to be able to reproduce your issue. So you need to give me at least a command that will demonstrate the problem as well as describing the error that occurred.

wacns commented 1 month ago

Hello @wacns. I cannot really help you with this information alone. The url you show is not even a valid instagram url. To be able to help you I need to be able to reproduce your issue. So you need to give me at least a command that will demonstrate the problem as well as describing the error that occurred.

image

Scraping post comments ━━━━━━━━━━━ 0/1 posts - in 18.69s (?/s)
⠸ https://www.instagram.com/p/C_-um6ONESb/ 0/? comments in 18.69s (?/s) total: 0 comments
minet process was stopped because an error occurred!
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\minet\instagram\api_scraper.py", line 240, in request_json
    data = json.loads(text)
           ^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Python312\Scripts\minet.exe\__main__.py", line 7, in <module>
  File "C:\Python312\Lib\site-packages\minet\cli\__main__.py", line 14, in main
    run("minet", __identifier__, MINET_COMMANDS)
  File "C:\Python312\Lib\site-packages\minet\cli\utils.py", line 49, in wrapper
    fn(*args, **kwargs)
  File "C:\Python312\Lib\site-packages\minet\cli\run.py", line 139, in run
    fn(cli_args)
  File "C:\Python312\Lib\site-packages\minet\cli\utils.py", line 332, in wrapper
    raise e
  File "C:\Python312\Lib\site-packages\minet\cli\utils.py", line 320, in wrapper
    return action(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\minet\cli\utils.py", line 468, in wrapper
    return action(cli_args, *args, **additional_kwargs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\minet\cli\instagram\comments.py", line 37, in action
    for comment in generator:
  File "C:\Python312\Lib\site-packages\minet\instagram\api_scraper.py", line 310, in comments
    data_post = self.request_json(url, magic_token=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\minet\web.py", line 1340, in decorated
    return retryer(fn, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\tenacity\__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\tenacity\__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Python312\Lib\site-packages\tenacity\__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\minet\instagram\api_scraper.py", line 242, in request_json
    raise RuntimeError("HTML for the request to " + url + " : " + text)
RuntimeError: HTML for the request to https://www.instagram.com/p/C_-um6ONESb/?__a=1&__d=dis :
Yomguithereal commented 1 month ago

Your command works on my end. Can you 1. give me your minet version with minet --version and 2. try wrapping your urls in quotes, because the dash - in the url might break the CLI args parsing.

wacns commented 1 month ago

3.1.0 (2024-10-2) i even tried another url still the same issue even though i used quotes "" https://www.instagram.com/reel/DA-qL5os9TW

Yomguithereal commented 1 month ago

It works on my end, but I suspect your issue here is because the tool does not succeed in getting a correct Instagram cookie to work. You should change your -c into --rcfile.

wacns commented 1 month ago

i think it worked by changing the -c to --rcfile and putting the url between double quotes.

Yomguithereal commented 1 month ago

I'll close this issue then :)