nerel-ds / NEREL-BIO

NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities
17 stars 2 forks source link

Mention Handy Dataset Reading Option in README #3

Closed nicolay-r closed 11 months ago

nicolay-r commented 1 year ago

Dear resource maintaners,

Thank you for sharing such a large and saturated with annotations collection! Since the original collection represent a BRAT-formatted document, for the quick-starting cases and work with relations, it might be found in writing an addtional service for parsing and extracting text parts with mentioned relations in it. To address this limitation, I am writing to contribute and propose a handy and quick solution for a quick extraction of most relations between mentioned objects just within a single command line with the following opensource framework:

python3 -m arekit_ss.sample --writer jsonl --source nerel-bio --sampler bert --text_parser lm --output_dir "NEREL-samples`

Basically, it converts the BRAT-based representation of NEREL collection into jsonl. Other formats, such as csv or sqlite3, entities masking, are supported and the complete list of the formats could be found here

Proposal for a quick README modification

I hope this is both for the beneficial for a quick application of your collection by other as well as personal interest in maintaining opensource solutions to contribute in studies, based on semantic relations in texts.

Here is an example on how to add the reading info into the README:

[![](https://img.shields.io/badge/AREkit--ss_Compatible-0.23.1-purple.svg)](https://github.com/nicolay-r/arekit-ss#usage)

> πŸ““ **Update 25 October 2023**: this collection **is now available in [arekit-ss](https://github.com/nicolay-r/arekit-ss)**
> for a [quick sampling](https://github.com/nicolay-r/arekit-ss#usage) of contexts with most subject-object relation mentions with just **single script into
> `JSONL/CSV/SqLite`** including (optional) language transfering πŸ”₯ [[Learn more ...]](https://github.com/nicolay-r/arekit-ss#usage)

Which will look as follows:

πŸ““ Update 25 October 2023: this collection is now available in arekit-ss for a quick sampling of contexts with most subject-object relation mentions with just single script into JSONL/CSV/SqLite including (optional) language transfering πŸ”₯ [Learn more ...]

tvbat commented 11 months ago

Added. Thanks!