Open mori-c opened 5 years ago
oh my gosh, I didn't think anyone actually used this! The paper link doesn't show because I haven't run the scrapper in (looks at last commit ... three years!). If you're interested, I could probably start this back up again. The arXiv people mentioned they would do something like this (and I thought arXiv sanity would help), but it still seems to be a problem.
Thoughts? Want to help?
Hah, the arXiv2git isn't a bad idea. I've never done a chrome extension before, so if you're still open, knowing that I'll need some guidance, I'll be happy to help. I'll look at your code shortly; what would the steps be to get this running aside from scrapping?
The chome extension pulls the data from GitHub (I think, I'll have to check when I get home). Basically though, the data is served from the static file you see on the repo.
On Tue, Apr 16, 2019, 10:40 PM mori-c notifications@github.com wrote:
Hah, the arXiv2git isn't a bad idea. I've never done a chrome extension before, so if you're still open, knowing that I'll need some guidance, I'll be happy to help. I'll look at your code shortly; what would the steps be to get this running aside from scrapping?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/thoppe/arXiv2git/issues/1#issuecomment-483915198, or mute the thread https://github.com/notifications/unsubscribe-auth/AClOoiCIsKwmx6ZVGgz-LRGsvtTu7zSrks5vhomSgaJpZM4cuQvf .
Yup, it pulls the data directly from github. See
The basic idea is to query github for repos with arXiv in the README,
q = ' '.join([
"arxiv",
"in:description,readme",
"created:{date}".format(date=date),
"fork:false",
])
I chunked it by one per month to get something reasonable. After that, each README was downloaded and parsed for arXiv links. In theory, we could rerun the pipeline (and maybe clean it up?) to get updated links. To do it properly, the service would be run once a month.
Problem
Situation
Here's an example of the extension returning the following statement:
Reference
With hopes the extension would return something similar to this source:
Proof
While the paper referenced four GitHub links