t3nsor / quora-backup

Python scripts to download Quora answers and convert them into a more portable form
GNU General Public License v2.0
125 stars 72 forks source link

Converter Fails to Fetch Answer #21

Open jggouvea opened 4 years ago

jggouvea commented 4 years ago

$ ../software-git/quora-backup/converter.py answers-en answers-en-ready Found 2503 answers Filename: 2015-01-18 What-are-some-of-the-worst-baby-names.html Traceback (most recent call last): File "../software-git/quora-backup/converter.py", line 216, in print('[WARNING] Failed to locate answer on page (Source URL was %s)' % url, file=sys.stderr) NameError: name 'url' is not defined

t3nsor commented 4 years ago

The crash bug should be fixed by https://github.com/t3nsor/quora-backup/commit/1032cbef56c601ad391ed3005bd7d36f00f3ec5d If you want me to look into why it failed to locate the answer, then you have to send me the HTML file

InvincibleJuggernaut commented 3 years ago

It seems it didn't fix the problem. Actually, the HTML files generated by the crawler are able to fetch only the first few lines of the articles. I have attached the HTML file below. (GitHub doesn't seem to support .html format, so I have attached a .docx file with the HTML code) html.docx

t3nsor commented 3 years ago

It looks like Quora has changed their page format, so now the answer content is initially loaded in a structured format but JavaScript is required to actually render it as HTML. So the converter in its current form will not work. I will think about how to address this. I am going to get a copy of my answer archive using the GDPR tool and then see whether there is still a need for the converter.