soxoj / maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites
https://t.me/osint_maigret_bot
MIT License
10.06k stars 784 forks source link

Maigret does not save a PDF reports #454

Open Alexell opened 2 years ago

Alexell commented 2 years ago

Checklist

Description

Info about Maigret version you are running and environment (--version, operation system, ISP provider): maigret 0.4.3 Socid-extractor: 0.0.23 Aiohttp: 3.8.1 Requests: 2.27.1 Python: 3.8.10

How to reproduce this bug (commandline options / conditions):

Errors at the end of execution (I don't know if they are the cause of the problem):

/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py:1207: UnknownTimezoneWar ning: tzname CDT identified but not understood. Pass tzinfos argument in order to corre ctly return a timezone-aware datetime. In a future version, this will raise an exception. warnings.warn("tzname {tzname} identified but not understood. " Traceback (most recent call last): File "/usr/local/bin/maigret", line 8, in sys.exit(run()) File "/usr/local/lib/python3.8/dist-packages/maigret/maigret.py", line 723, in run loop.run_until_complete(main()) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/usr/local/lib/python3.8/dist-packages/maigret/maigret.py", line 701, in main save_pdf_report(filename, report_context) File "/usr/local/lib/python3.8/dist-packages/maigret/report.py", line 82, in save_pdf_re port pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/document.py", line 104, in pisaDo cument context = pisaStory(src, path, link_callback, debug, default_css, xhtml, File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/document.py", line 67, in pisaSto ry pisaParser(src, context, default_css, xhtml, encoding, xml_output) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 761, in pisaPars er pisaLoop(document, context) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 699, in pisaLoop pisaLoop(node, context, path, kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, kw) [Previous line repeated 7 more times] File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 513, in pisaLoop attr = pisaGetAttributes(context, node.tagName, node.attributes) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 125, in pisaGetA ttributes nv = c.getFile(nv) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/context.py", line 795, in getFile return getFile(name, relative or self.pathDirectory) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/util.py", line 762, in getFile file = pisaFileObject(*a, **kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/util.py", line 665, in init conn.request("GET", path) File "/usr/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 951, in send self.connect() File "/usr/lib/python3.8/http/client.py", line 1425, in connect self.sock = self._context.wrap_socket(self.sock, File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/usr/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: WRONG_SIGNATURE_TYPE] wrong signature type (_ssl.c:1131)

soxoj commented 2 years ago

Hey, please, specify the username you've searched for.

Alexell commented 2 years ago

@soxoj My username as you see it.

soxoj commented 2 years ago

I am unable to reproduce a crash of report creation for now, only unknown timezone warning:

[-] Generating report info...
/usr/local/lib/python3.9/site-packages/dateutil/parser/_parser.py:1207: UnknownTimezoneWarning: tzname CDT identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  warnings.warn("tzname {tzname} identified but not understood.  "
[-] HTML report on all usernames saved in /tmp/report_Alexell_plain.html

Could you attach list of you packages with versions got with pip3 freeze > pkgs.txt? I'll try to reproduce your full environment.

Alexell commented 2 years ago

In your screenshot, i see message about html report. My html report is saved normally with this username. A PDF report is not saved with the same username (maigret alexell --pdf command). I am attaching what you asked for. pkgs.txt

soxoj commented 2 years ago

I guess the problem is caused by http error while xhtml2pdf trying to download and render some profile image by URL. But I still counldn't reproduce it :( Let's try to localize the site. Is the following command fails with crash? maigret alexell --pdf --retries 0 --top-sites 100 --no-recursion If yes, please send the console output.

Alexell commented 2 years ago

The PDF report for this command is generated normally. But the program execution time was short and the report turned out to be much shorter than the html-report was after running without additional arguments. Apparently, there is still some kind of problem site, but the program does not reach it in the last launch option.

soxoj commented 2 years ago

Well, so let's try different modes :)

  1. maigret alexell --pdf --retries 0 -a --no-recursion
  2. maigret alexellpro --pdf --retries 0 -a --no-recursion
Alexell commented 2 years ago
  1. After execution, we have the same errors that were at the very beginning and a report with a size of 0 bytes.
  2. The report was saved normally.
soxoj commented 2 years ago

Okay, let's increase count of sites step-by-step, e.g.: maigret alexell --pdf --retries 0 -a --no-recursion --top-sites 200 Please, attach the text file with a full console output after reproducing the error.

Alexell commented 2 years ago

console_log.txt

soxoj commented 2 years ago

Thanks, got it, let's check the following sites: maigret alexell --site Flickr --site Pastebin --site BuzzFeed --site Tinder --site MixCloud --site BitBucket --site last.fm --site Gravatar --site uID.me --site Paypal --site Kik --pdf

Alexell commented 2 years ago

With these sites, the report was saved normally.