nexus-stc / stc

Distributed free search engine and AI tools that grant access to knowledge
http://standard-template-construct.org
359 stars 27 forks source link

Implement classification for papers #12

Open the-superpirate opened 11 months ago

the-superpirate commented 11 months ago

Motivation

Classification of papers is essential task, it solves two tasks: creation of navigational menu in bot and web and also allows to cherry-pick papers on specific topic for mass processing.

The task suggests creation of classifier that takes publication metadata and derives a list of highly likely classes for the record.

Classification approach

https://www.frontiersin.org/articles/10.3389/frma.2023.1149834/full

This approach is described in the paper but have no any sources. One way can be reaching authors and requesting sources for kick-starting implementation

Technical description

What is needed: library that accepts paper description by the dict of the following format

authors: List[{first: str, given: str, name: str}]}
abstract?: str
content?: str
id: {dois: List[str]}
issued_at?: int
languages: List[str]
metadata?: {container_title?: str, publisher?: str}
tags?: List[str]
title: str

and returns SciNobo class for the paper. Fields are more precisely described in the schema. Consider all fields except title and abstract as absent most of times.

How to Start

pip install stc-geck
geck - documents

You will receive a stream of documents that is a subject of the task.