tristanlatr / MassWappalyzer

Run Wappalyzer asynchronously on a list of URLs and generate a Excel file containing all results.
GNU General Public License v2.0
12 stars 5 forks source link

raw_results referenced before assignment #1

Closed hadifarnoud closed 3 years ago

hadifarnoud commented 3 years ago

after days of running masswappalyzer, it threw this error:

Loading...:  98%|█████████████████████████████████████████████████████████████████████████████████  | 24748/25337 [30:11:01<43:06,  4.39s/it]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 313, in run
    progress=True)
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 137, in perform
    func, elements), **tqdm_args))
  File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1166, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 256, in analyze
    p = subprocess.run(args=cmd, timeout=self.TIMEOUT, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 438, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 435, in main
    mass_w.run()
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 324, in run
    for item in raw_results:
UnboundLocalError: local variable 'raw_results' referenced before assignment
hadifarnoud commented 3 years ago

turns out that was a memory issue. after fixing that I now get this error

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 438, in <module>
    main()
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 435, in main
    mass_w.run()
  File "/usr/local/lib/python3.6/dist-packages/masswappalyzer.py", line 326, in run
    for app in item['applications']:
KeyError: 'applications'
tristanlatr commented 3 years ago

Hello @hadifarnoud ,

So this is the fix for your issue if I understand ? https://github.com/Kamva/MassWappalyzer/commit/6d60d38d6109ee124dc7b9ea1e36f8827b54c2f4

This issue is caused by Wappalyzer changing the denomination of their fields. Can we find a way for this to work with the older version of wappalyzer too ?

hadifarnoud commented 3 years ago

no it hasn't @tristanlatr. as you said their data format changed and I was not able to fix it. I don't know enough Python to fix it

hadifarnoud commented 3 years ago

@tristanlatr any chance you get to fix this issue?

tristanlatr commented 3 years ago

Hi @hadifarnoud ,

Thanks again for the report, it should be fixed now!

hadifarnoud commented 3 years ago

I still think it's not working. the exported csv file looks like this which is obviously wrong

Urls,Last_Url
"http://""aparat.com""/ (0)","http://""aparat.com""/"
"http://""shaparak.ir""/ (0)","http://""shaparak.ir""/"
"http://""varzesh3.com""/ (0)","http://""varzesh3.com""/"
"http://""emofid.com""/ (0)","http://""emofid.com""/"
"http://""agah.com""/ (0)","http://""agah.com""/"
"http://""telewebion.com""/ (0)","http://""telewebion.com""/"
"http://""tsetmc.com""/ (0)","http://""tsetmc.com""/"
"http://""samsung.com""/ (0)","http://""samsung.com""/"
"http://""chess.com""/ (0)","http://""chess.com""/"
"http://""divar.ir""/ (0)","http://""divar.ir""/"
"http://""beytoote.com""/ (0)","http://""beytoote.com""/"
"http://""namasha.com""/ (0)","http://""namasha.com""/"

input csv file

root@vless:~# cat domains-test.csv
"aparat.com"
"shaparak.ir"
"varzesh3.com"
"emofid.com"
"agah.com"
"telewebion.com"
"tsetmc.com"
"samsung.com"
"chess.com"
"divar.ir"
"beytoote.com"
"namasha.com"

command I used python3 -m masswappalyzer -w wappalyzer -a 8 -i domains-test.csv -f csv -o iran-top-sites-tech.csv

tristanlatr commented 3 years ago

Sorry It must be the CSV output that is broken, try JSON or excel ?

I'll looks into the CSV output issue...

hadifarnoud commented 3 years ago

I think the issue is not fixed yet. this is JSON export

    {
        "Last_Url": "http://\"telewebion.com\"/",
        "Urls": "http://\"telewebion.com\"/ (0)"
    },
    {
        "Last_Url": "http://\"tsetmc.com\"/",
        "Urls": "http://\"tsetmc.com\"/ (0)"
    },
    {
        "Last_Url": "http://\"samsung.com\"/",
        "Urls": "http://\"samsung.com\"/ (0)"
    },
    {
        "Last_Url": "http://\"chess.com\"/",
        "Urls": "http://\"chess.com\"/ (0)"
    },
    {
        "Last_Url": "http://\"divar.ir\"/",
        "Urls": "http://\"divar.ir\"/ (0)"
    },
    {
        "Last_Url": "http://\"beytoote.com\"/",
        "Urls": "http://\"beytoote.com\"/ (0)"
    },
    {
        "Last_Url": "http://\"namasha.com\"/",
tristanlatr commented 3 years ago

Hi, there is a misunderstanding about the format of the input file,

The quotes should not be there.

Your input file should look like :

aparat.com
shaparak.ir
varzesh3.com
emofid.com
agah.com
telewebion.com
tsetmc.com
samsung.com
chess.com
divar.ir
beytoote.com
namasha.com

It's not CSV, just a list of URL: one URL per line.

hadifarnoud commented 3 years ago

oh, that fixed it. thanks