sebastianruder / NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
https://nlpprogress.com/
MIT License
22.73k stars 3.62k forks source link

A Knowledge Graph resource of NLP-progress #617

Open jd-coderepos opened 2 years ago

jd-coderepos commented 2 years ago

Dear authors, this repository is such a great resource! Many thanks for creating it. I would like to suggest that maybe the Open Research Knowledge Graph (https://orkg.org/) could be leveraged to enlist such resources for persistence, knowledge sharing, and querying. Please find below some resources I created related to the information in this repository.

Named Entity Recognition Tasks in the MUC series

https://orkg.org/comparison/R162797/

NER in the Automatic Content Extraction (ACE) Series

https://orkg.org/comparison/R162851/

Named Entity Recognition in the CoNLL Series and the OntoNotes corpus as a related resource

https://orkg.org/comparison/R166315/

Named Entity Recognition Based on Wikipedia

https://orkg.org/comparison/R166240/

A comparison of the annotated resources of software mentions in scholarly articles

https://orkg.org/comparison/R166560/

NLP Datasets for Named Entity Recognition and Relation Extraction from Biomedicine Scholarly Articles

https://orkg.org/comparison/R163265/

Comparisons and Visualizations of the CrossNER Benchmark Corpus for its Source and Target Domains

https://orkg.org/comparison/R163843/

Surveying BioNLP Shared Tasks Corpora for Named Entity Recognition

https://orkg.org/comparison/R165702/

Surveying BioCreAtIvE Shared Tasks Corpora for Named Entity Recognition

https://orkg.org/comparison/R172155/


The benefits of such machine-encoded data is that Reviews can be automatically created thereby.

Surveying the BioCreAtIvE Shared Task Series

https://orkg.org/review/R172166

Surveying the BioNLP Shared Task Series

https://orkg.org/review/R165924

I would be very happy to offer support in this direction. :)

RicardoUsbeck commented 2 years ago

I am sure, my team at NFDI4DS (https://www.nfdi4datascience.de/) would also be happy to help to convert data

sebastianruder commented 2 years ago

This is a great set of resources! What do you think would be the best way to integrate them?

jd-coderepos commented 2 years ago

@sebastianruder could we perhaps schedule a call as a starting point, where I could present the ORKG and its features to you? Perhaps then we could elicit a set of requirements to integrate the data. My contact information is here https://sites.google.com/view/jen-web/contact?authuser=0

@RicardoUsbeck happy to hear your thoughts on how best you think perhaps we could go about it, also relaying the information to your team at NFDI4DS. :)

RicardoUsbeck commented 2 years ago

Sure, will do. Actual work will start in Oct.

We wanted to crawl this website to feed the (O)RKG. On the other hand, it would be nice to have a manual (?) export form ORKG (which will undoubtedly grow faster) here...but there are also downsides to this approach. Happy to help you discuss ideas.

jd-coderepos commented 2 years ago

Indeed I fully agree with having crawling scripts for the website to structure the data. Perhaps then additional curational support on top of it, to ensure the data quality...

If the resulting dataset can be structured in an excel sheet and is only a one level graph, the csv import feature https://orkg.org/csv-import can be leveraged to bulk import the papers themselves. The individual comparison views themselves then can be created manually...

Furthermore, I kindly suggest also that a template can be defined https://orkg.org/templates for such data enabling new users to seamlessly leverage the defined template when adding new data.

Happy to continue the discussion thread. Please let me know.

sebastianruder commented 2 years ago

Hi both, are you ok meeting without me? I think you are both more up-to-date on this type of data. I'm happy to go with whatever you decide, as long as it can be reasonably integrated into the website.

jd-coderepos commented 2 years ago

@sebastianruder Will be happy to share updates here, in due course.