vectara / vectara-ingest

An open source framework to crawl data sources and ingest into Vectara
https://vectara.com
Apache License 2.0
147 stars 50 forks source link

improvements to github crawler #37

Closed ofermend closed 1 year ago

ofermend commented 1 year ago

added PRs to github crawler (per issue #15) Refactor to simplify

justinhayes commented 1 year ago

Approved. I'm not familiar with the github API or object model, but your usage of that, as well as the vectara API, look good to me. One thing to consider (and it's larger in scope than this specific PR) is to add timestamp fields that are numeric (e.g. number of epoch seconds) whenever you have created_at or updated_at metadata fields. That way the user can do date range based filtering by defining the fields as filter attributes. But that's a nice to have, and not a blocker for this PR.