mnm-matin / ai_alignment_graph

https://mnm-matin.github.io/ai_alignment_graph/
MIT License
8 stars 2 forks source link

Dataset collection improvements #16

Open tobiolabode opened 2 weeks ago

tobiolabode commented 2 weeks ago

Based on what we discussed on the discord call, I will be looking more into dataset improvements. The scraper already works well, as mentioned from TODOs just needs to be expanded. Also some related to other tasks, like Non-LangChain library, we want to use for this repo. This will help with the filtering tasks below.

Some thoughts as well, is better for this focus enriching our arxiv dataset first (filtering, search, etc) or expanding our dataset as much possible? (like the non-arxiv papers).

I'm open to any ideas regarding the direction to take with the dataset section.

These tasks I will start look into:

Taken from the main page

Best, Tobi

mnm-matin commented 2 weeks ago

Thanks for this Tobi, the prototype script has been pushed to the repo: https://github.com/mnm-matin/ai_alignment_graph/blob/v4/generate_md/get_arxiv_to_csv.py