Closed baberabb closed 4 months ago
Thanks for the review @blester125! I think i have addressed all your points but let me know if I missed anything! Added the readme as well. The only thing left are the tables. They are mostly fine but the whitespace formatting is quite off. Will bring it up in the meeting!
If you rebase on main and push again the lint error should go away and we can get this merged!
Thanks! Feel free to merge when ready!
Thanks! Feel free to merge when ready!
Whenever you're ready!
This PR adds the code to process the USPTO dataset extracted from Google Patents Public Dataset and uploaded to HF. The dataset covers all US patent applications until Oct 27, 2023 (including historical).
closes #9
Edit: switched over to pandoc as that does the latex formatting pretty well, and also works well with the overall html.