This PR updates how stack exchange preprocessing can be done.
It adds two new flags, --sort lets you control how answers are added to the document. --sort=time orders the answers by the date they were first posted. --sort=votes orders the answers by the score they get, with the "accepted answer" being the first in the list.
This PR also adds a --skip_comments flag, which allows one to skip adding the comments to the generated document as stackexchange seems to consider comment ephemeral.
It also adds a new metadata field which is the set of all licenses that appear on the comments/answers/the question that go into a single document. Then comments/answers are posted way after the original question, the version of the CC license can change.
These changes are based on discussions from the last meeting.
This PR updates how stack exchange preprocessing can be done.
It adds two new flags,
--sort
lets you control how answers are added to the document.--sort=time
orders the answers by the date they were first posted.--sort=votes
orders the answers by the score they get, with the "accepted answer" being the first in the list.This PR also adds a
--skip_comments
flag, which allows one to skip adding the comments to the generated document as stackexchange seems to consider comment ephemeral.It also adds a new metadata field which is the set of all licenses that appear on the comments/answers/the question that go into a single document. Then comments/answers are posted way after the original question, the version of the CC license can change.
These changes are based on discussions from the last meeting.