Open powerfulTrouser opened 6 years ago
Finally I use python to parse dat_0 to many many many json file
`# coding:utf-8 import json import sys import os import stat
i = 0 knife = '{"Page":{"Status":'
def is_json(myjson): try: json_object = json.loads(myjson) except ValueError as e: try: json_object = json.loads(myjson.rsplit('}', 2)[0] + '}') except ValueError as e: print(e) print(myjson) return 0 print(myjson.rsplit('}', 2)[0] + '}') return myjson.rsplit('}', 2)[0] + '}' return myjson
with open('/Home/dat_0.json') as f:
for line in f:
for frag in s.split(knife):
if len(frag) is 0 and '{' not in frag:
del frag
else:
frag = frag.rsplit('}', 1)[0]
frag = knife + frag + '}'
frag = str(frag)
if is_json(frag) is not 0:
result_json = json.loads(is_json(frag))
if result_json['Page']['Status'] != 403 and result_json['Page']['Status'] != 404:
print("下一個")
path = ('/Home/parse dat-1/' +
result_json['URL'].encode('utf8')[7:-1].replace('/', '斜線')+'.json')
try:
f = open(path, 'w+')
except IOError as e:
path = ('/Home/parse dat-1/' +
'有問題'+str(i)+'.json')
i = i + 1
print(e)
f = open(path, 'w+')
f.write(frag)
f.close()
`
It won't generate json file which status is 403 or 404.
I use '{"Page":{"Status":' to split the file, wondering there's any better cut string.
This is not a beautiful solution, but it works however.
I'm a student and I'm trying to follow this site
http://www.automatingosint.com/blog/2016/09/dark-web-osint-part-four-using-scikit-learn-to-find-hidden-service-clones/
to use machine learning to analysis dark web. But I had found that 'snapshot' became unavailable. Then I found an issue said this function had been moved to dat_0 My dat_0 file is about 10G. I tried to parse it by python and kaitai struct but failed. onions.py.txt parsedat.py.txt Is there any way to at least implement the analysis from the website? (use old version onionscan or some tutorial of how to achieve same goal by new onionscan or somewhat)
Thanks!