Open jggouvea opened 4 years ago
The crash bug should be fixed by https://github.com/t3nsor/quora-backup/commit/1032cbef56c601ad391ed3005bd7d36f00f3ec5d If you want me to look into why it failed to locate the answer, then you have to send me the HTML file
It seems it didn't fix the problem. Actually, the HTML files generated by the crawler are able to fetch only the first few lines of the articles. I have attached the HTML file below. (GitHub doesn't seem to support .html format, so I have attached a .docx file with the HTML code) html.docx
It looks like Quora has changed their page format, so now the answer content is initially loaded in a structured format but JavaScript is required to actually render it as HTML. So the converter in its current form will not work. I will think about how to address this. I am going to get a copy of my answer archive using the GDPR tool and then see whether there is still a need for the converter.
$ ../software-git/quora-backup/converter.py answers-en answers-en-ready Found 2503 answers Filename: 2015-01-18 What-are-some-of-the-worst-baby-names.html Traceback (most recent call last): File "../software-git/quora-backup/converter.py", line 216, in
print('[WARNING] Failed to locate answer on page (Source URL was %s)' % url, file=sys.stderr)
NameError: name 'url' is not defined