philarkwright / DGA-Detection

DGA Domain Detection using Bigram Frequency Analysis
Other
52 stars 32 forks source link

something with alexa.. #3

Open ibarkay opened 5 years ago

ibarkay commented 5 years ago

raceback (most recent call last): File "dga_detection.py", line 314, in load_data() File "dga_detection.py", line 93, in load_data training_data = alexa.toplist(1000000) File "/Users/____/Documents/PycharmProjects/__/venv/src/alexa-top-sites/alexa/init.py", line 32, in top_list return [a.next() for x in xrange(num)] StopIteration

thanx :)

helb commented 4 years ago

This is because the code tries to get a million domains from alexa:

dga_detection.py:93:

training_data = alexa.top_list(1000000)

alexa/__init__.py:32:

return [a.next() for x in xrange(num)]

…but the top-1m.csv.zip downloaded by alexa/__init__.py only has ~576k domains now for some reason:

$ wc -l top-1m.csv
576602 top-1m.csv

A proper fix would be to change alexa/__init__.py to use actual line count from the file, but if you just want a quick one, change the number in dga_detection.py:93:

training_data = alexa.top_list(576602)

(count the lines yourself, it probably changes often)