In continuation of #4 pipelines for the crawled content of each paper page is necessary. Thus, build the following pipelines:
[x] Core Information related to the Paper.
[x] Authors
[x] Author Short IDs
[x] JEL Codes
The three latter items might be a 1:m mapping, thus they are kept astray from the 1:1 core information. In these cases, information might be "updated" in a sense, that a JEL code might be added or taken. These actions require special treatment in the pipeline.
The core information is historized in spell format.
Currently, only a small random part of the known papers is updated to keep the download volume dispersed and low.
In continuation of #4 pipelines for the crawled content of each paper page is necessary. Thus, build the following pipelines:
The three latter items might be a 1:m mapping, thus they are kept astray from the 1:1 core information. In these cases, information might be "updated" in a sense, that a JEL code might be added or taken. These actions require special treatment in the pipeline.
The core information is historized in spell format.
Currently, only a small random part of the known papers is updated to keep the download volume dispersed and low.