redhat-et / foundation-models-for-documentation

Improve ROSA customer experience (and customer retention) by leveraging foundation models to do “gpt-chat” style search of Red Hat customer documentation assets.
Other
26 stars 12 forks source link

Added nb to explore ways to convert pdf to QA dataset #5

Closed suppathak closed 1 year ago

suppathak commented 1 year ago

Related #4

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

suppathak commented 1 year ago

I do not have access to assign or add reviewers.

cc'ing it here: @codificat , @Shreyanand !! Thanks

codificat commented 1 year ago

One question: is this specific to the FAQ page only?

I am asking because I believe this page might be "one of its kind", and if that's the case it might be better to just copy&paste and apply some formatting.

It seems to me that there are a few things in that page that could over-complicate this automated task (multiline answers, links within the text) that might make this not worth it.

What do you think?

Don't get me wrong: it looks like a great start! (thanks!) I'm just wondering if it is worth pursuing, as I believe mostly all the data we will work with will not be in Q/A (FAQ) format

suppathak commented 1 year ago

One question: is this specific to the FAQ page only?

I am asking because I believe this page might be "one of its kind", and if that's the case it might be better to just copy&paste and apply some formatting.

It seems to me that there are a few things in that page that could over-complicate this automated task (multiline answers, links within the text) that might make this not worth it.

What do you think?

Don't get me wrong: it looks like a great start! (thanks!) I'm just wondering if it is worth pursuing, as I believe mostly all the data we will work with will not be in Q/A (FAQ) format

I do agree with your point. Since we are not only looking for Q&A docs, but instead all kinds of docs. Need to come up with more general way to solve this Q&A problem. Thanks!

codificat commented 1 year ago

By the way, I found a very similar FAQ here: https://www.rosaworkshop.io/rosa/14-faq/ For that one there is a source file in GitHub.

suppathak commented 1 year ago

Closing this since we are not more dealing with extractive QA anymore. We already have the markdown files and validation dataset in the data folder.