martenlienen / icml-nips-iclr-dataset

Papers, authors and author affiliations from ICML, NeurIPS and ICLR 2006-2023
34 stars 14 forks source link

Scraping Other Journals #5

Closed tayyabkhalil-313 closed 2 months ago

tayyabkhalil-313 commented 2 months ago

I was just working on a similar project and came across this awesome repo. Thanks for publishing it! I have a few questions:

  1. From the main conference site like this, I understand how the ids for each paper are found but how did you know about existence of such https://neurips.cc/Conferences/2022/Schedule?showEvent={id} links for each paper?
  2. I have a long list of journals I want to scrape, would there be a generic approach to it or will be journal specific? Some of the journals do not even provide authors' details which I am interested in.
martenlienen commented 2 months ago

The key to this script is that all of these conferences use the same software for their website, so they all have similar URLs. This probably won't hold for other journals. There you would have to figure out the URLs and how to parse authors, abstract etc. out of the URL's content for each journal individually (unless they also share a common structure of their website).

I knew about this URL because there is a link to it when you click through the conference website enough :)