wergstatt / pprkrkn

0 stars 0 forks source link

Pipelines for crawled paper information #12

Closed wergstatt closed 4 years ago

wergstatt commented 4 years ago

In continuation of #4 pipelines for the crawled content of each paper page is necessary. Thus, build the following pipelines:

The three latter items might be a 1:m mapping, thus they are kept astray from the 1:1 core information. In these cases, information might be "updated" in a sense, that a JEL code might be added or taken. These actions require special treatment in the pipeline.

The core information is historized in spell format.

Currently, only a small random part of the known papers is updated to keep the download volume dispersed and low.