karpathy / arxiv-sanity-lite

arxiv-sanity lite: tag arxiv papers of interest get recommendations of similar papers in a nice UI using SVMs over tfidf feature vectors based on paper abstracts.
https://arxiv-sanity-lite.com
MIT License
1.16k stars 130 forks source link

papers.labml.ai #2

Open hnipun opened 2 years ago

hnipun commented 2 years ago

Hi @karpathy,

We built papers.labml.ai in May (introductory tweet) to discover research papers based on popularity on Twitter. We were using arxiv-sanity to discover papers and I started this as a side project inspired by it (partly because it was down from time to time).

We worked on it on and off since May and have added a bunch of features, such as:

And we are working on something very similar to tags on sanity-lite (which we call lists).

We love to hear your feedback and suggestions. Thanks for releasing your work.

Screenshot 2021-11-14 at 10 24 45 Screenshot 2021-11-14 at 10 25 36 Screenshot 2021-11-14 at 10 27 01
karpathy commented 2 years ago

Hi Nipun, it's fun to hear from you! I actually have https://papers.labml.ai/papers/weekly pinned in my toolbar and visit it regularly, great work on the site and I look forward to seeing where you take it!

hnipun commented 2 years ago

@karpathy, Happy to hear you are finding it useful. Mostly, we make improvements based on our personal needs. Let us know if you have any suggestions for improvements. Thanks.

GeorvityLabs commented 2 years ago

@karpathy @hnipun could you guys add a feature , where it filters out papers from arxiv that have "github" repo links.

For eg : If I search CLIP , it shows only the papers that have a github repo link (in the comments , abstract or under code & data)

That way only the papers with github code gets displayed on the screen (for folks who are looking for papers having implementation ready)

karpathy commented 2 years ago

@GeorvityLabs This would potentially require downloading the full text of the paper, dramatically increasing the complexity. Currently we can afford to only scrape the abstracts and this is very helpful. So I don't believe this is easy sadly.

GeorvityLabs commented 2 years ago

@karpathy that makes sense.

In case of some papers . the authors include github repo links in their abstract, so scrapping just the abstract alone would work in those cases.

But in most other papers, the github repo links are usually included under : Code & Data, Comments or the Abstract section (on arxiv.org). So, if we manage to scrape these three sections separately it would be possible to implement the feature.

GeorvityLabs commented 2 years ago

@karpathy

I tried using the davinci-codex engine , to generate a python script , just for fun to see what codex can do :


import webbrowser

#get input string from user
input_string = input("Enter a string: ")

#search arvix.org for that input string
search_string = "https://arxiv.org/search/?query=" + input_string + "&searchtype=all&source=header"

#filter search such that only papers with code is displayed
search_string = search_string + "&filter=has-official-code:y"

#print the search string
print(search_string)

#open the search string in a new tab
webbrowser.open_new_tab(search_string)

I was wondering if there is any &filter options that enables us to check the Code & Data , Abstract and Comments separately.

In the above code , I don't think the &filter=has-official-code:y" is doing anything much. But, it would be awesome if we could have such filter options.

subramanya1997 commented 2 years ago

I think github repo can be obtained from papeswithcode.com. I just added a github_links field to papers table

GeorvityLabs commented 2 years ago

@subramanya1997 not all papers have code up on paperswithcode.com , usually people link their github repos more often than via paperswithcode.com.

subramanya1997 commented 2 years ago

@GeorvityLabs True. Some of them won't be mentioned in the paper but would be released later. I though it is better to have some than have none.

GeorvityLabs commented 2 years ago

@subramanya1997 true. But, usually after been reading both paperwithcode and arvix for years, one thing i noticed is , usually most github links don't make it to paperswithcode. but like you said , something is better.