gordonwatts / snowmass-chat

Experiments exploring the US Snowmass Process documents using LLM
Apache License 2.0
2 stars 0 forks source link

Add keyword extraction functionality #24

Closed gordonwatts closed 10 months ago

gordonwatts commented 10 months ago

This pull request adds a new feature to the system that allows for keyword extraction from PDF files. The lc_keywords.py file has been added, which uses the rake-nltk library to extract keywords from a given PDF file. The extracted keywords are then displayed in a ranked table using the rich library. This feature can be useful for analyzing and categorizing PDF documents based on their content.

Fixes #22

gordonwatts commented 10 months ago

In the end, this was looking for key "phrases" - similar to verbs. It was not grabbing proper nouns, which is what we really need here. So, not going to add this to the repo in the end.