This PR reorganizes the structure of the README to be more user focused. This PR addresses a few issues I was seeing with customers:
Redundancy: A handful of concepts were explained multiple times increasing the overall reading burden for the user
Relevance: There were a few concepts/sections that weren't relevant, or top priority for the end user. These sections were moved so they're still accessible, but is located so that it only require reading if the user needs it.
I.e. Tokenize a single file is a section that just adds noise since it recommends users don't use it unless they have custom splits - as such, it's been moved to the advanced usage section.
Order: I adjusted the ordering of concepts here so that it follows the mental model of the standard customer.
Required input format? -> what is produced as output? -> Tokenizer? -> max_seq_len -> Quickstart -> Customize from there
I believe this structure will be a lot more straightforward for customers to digest and get started with. It contains virtually all the same information, just re-ordered/re-structured a bit so it flows better. I have some customers that are willing to read and provide feedback as well. There will probably be some additional tweaks needing to be made here, but overall this should clear things up for the new users a bit.
Summary
This PR reorganizes the structure of the README to be more user focused. This PR addresses a few issues I was seeing with customers:
I believe this structure will be a lot more straightforward for customers to digest and get started with. It contains virtually all the same information, just re-ordered/re-structured a bit so it flows better. I have some customers that are willing to read and provide feedback as well. There will probably be some additional tweaks needing to be made here, but overall this should clear things up for the new users a bit.