researchdata-se / docs.researchdata.se

API and metadata documentation for researchdata.se
https://docs.researchdata.se
1 stars 0 forks source link

Use Swepub Classify to find SSIF subjects when missing #18

Open borsna opened 1 week ago

borsna commented 1 week ago

Before running title, abstract and keywords via Swepub Classify API:

Harvested       Transformed     Validated       Deleted         Repository
0               477             0               0               bolin
3               3               0               0               bth-zenodo
26              26              25              0               chalmers-materialsmodeling-zenodo
32              32              32              0               chalmers-ojuo
5               5               1               0               chalmers-zenodo
167             167             167             0               gbif-sweden
3               3               3               0               gu-flow-lab-zenodo
1188            1188            1101            0               gu-spraakbanken
1               0               0               0               hhs-zenodo
5               5               5               0               kau-kaucs-zenodo
36              35              5               0               kth-zenodo
2               2               0               0               liu-zenodo
0               0               0               0               ltu-zenodo
28              28              25              0               lu-darklab-zenodo
21              21              10              0               lu-dataguru
5               5               0               0               lu-firesafetylund-zenodo
1               1               1               0               lu-swinno-zenodo
0               0               0               0               mau-zenodo
12              12              12              0               raa-heritage_laboratory-zenodo
3               3               1               0               rise-zenodo
230             230             227             0               scilifelab-figshare
0               938             938             0               sites
152             152             97              0               su-figshare
8               8               2               0               su-zenodo
44              44              44              0               umu-ddb
1972            3385            2696            0               Total

after:

Harvested       Transformed     Validated       Deleted         Repository
0               477             0               0               bolin
3               3               2               0               bth-zenodo
26              26              25              0               chalmers-materialsmodeling-zenodo
32              32              32              0               chalmers-ojuo
5               5               1               0               chalmers-zenodo
167             167             167             0               gbif-sweden
3               3               3               0               gu-flow-lab-zenodo
1188            1188            1101            0               gu-spraakbanken
1               0               0               0               hhs-zenodo
5               5               5               0               kau-kaucs-zenodo
36              35              20              0               kth-zenodo
2               2               1               0               liu-zenodo
0               0               0               0               ltu-zenodo
28              28              25              0               lu-darklab-zenodo
21              21              10              0               lu-dataguru
5               5               2               0               lu-firesafetylund-zenodo
1               1               1               0               lu-swinno-zenodo
0               0               0               0               mau-zenodo
12              12              12              0               raa-heritage_laboratory-zenodo
3               3               2               0               rise-zenodo
230             230             227             0               scilifelab-figshare
0               938             938             0               sites
152             152             120             0               su-figshare
8               8               2               0               su-zenodo
44              44              44              0               umu-ddb
1972            3385            2740            0               Total

Added a criteria that the _score should be larger than 0.5

jomtov commented 1 week ago

How could we help improve the performance of our repositories locally, if at all?

borsna commented 1 week ago

Best way to improve the number of valid datasets would be to include at least one high level subject term from SSIF / FORD / EuroSciVoc so we can make better