Bug report for the filtering of "big" Scopus corpus

The following report concerns the bugs generated while treating a corpus of 10 CSVs file downloaded from Scopus. The files can be found in this archive: https://www.dropbox.com/s/4d6xu33ubkcxp8e/ScopusTestCorpus.zip?dl=0

The file contains the following number of lines (not counting the header)

scopus (0).csv -> 380 lines
scopus (1).csv -> 682 lines
scopus (2).csv -> 2841 lines
scopus (3).csv -> 312 lines
scopus (4).csv -> 628 lines
scopus (5).csv -> 878 lines
scopus (6).csv -> 1134 lines
scopus (7).csv -> 1370 lines
scopus (8).csv -> 1646 lines
scopus (9).csv -> 1120 lines in total 10.991 lines

The initial parsing (before the filters) is very quick (less than 3 seconds).

In the filters' screen, the max number of nodes (i.e. that one that correspond to occurence number >= 1) for several meta-data seems unrealistic (see also the screen capture). In particular, it's weird to have

1 References occurring in at least 2 records
1 Affiliation countries occurring in at least 1 record
8 Index keywords occurring in at least 1 record
1 Affiliation institutions occurring in at least 1 record The other variables (Author keywords; Authors; Sources; Funders) may be correct (but I have not way to check).

tommv / bibliograph

Bug report for the filtering of "big" Scopus corpus #19