taesiri / ArXivQA

WIP - Automated Question Answering for ArXiv Papers with Large Language Models (https://arxiv.taesiri.xyz/)
https://arxiv.taesiri.xyz/
281 stars 11 forks source link

New papers are replacing old papers, the dataset is being deleted? #4

Closed VatsaDev closed 10 months ago

VatsaDev commented 10 months ago

Hi, in the papers direcctory, its mentioned that the directory is truncated to the last 1000 files, with 5700 omitted. The huggingface directory simply times out.

Is the data even accessible anymore?

taesiri commented 10 months ago

Hello @VatsaDev

Currently, we have more than 6000 papers in the repository. However, due to GitHub/Hugging Face limits, browsing everything is not straightforward. I am planning to develop some tools for this purpose, including a set of hierarchical readme(s) sorted by date/topic, a Chrome extension to display QA directly on the arXiv page, and a front-end for browsing (or even adding new questions).

The most effective way to access all the data at the moment is by cloning the repository.

VatsaDev commented 10 months ago

So all the data is there, but it only shows up if you clone the repository?

taesiri commented 10 months ago

@VatsaDev Yes that is true.

VatsaDev commented 10 months ago

k thanks