oacore / jcdl_2019

Code and data used in our JCDL 2019 publication
2 stars 1 forks source link

Hundreds of rows merged into one #2

Open nemobis opened 5 years ago

nemobis commented 5 years ago

The dataset has one row which begins with an array of about 600 "ua", "kr"; the DOI provided is https://doi.org/10.5380/acd.v18i2.54000 so it might be that hundreds of items titled "sumario" have been merged.

Such merges probably have a negligible effect on the results (unless it generates a disproportionate amount of the rows with very large negative lags) but IMHO an easy gain might be to avoid merging more than N rows together, say 10 or 50. (Cf. https://github.com/dissemin/dissemin/issues/512#issuecomment-441985143 .)