The dataset has one row which begins with an array of about 600 "ua", "kr"; the DOI provided is https://doi.org/10.5380/acd.v18i2.54000 so it might be that hundreds of items titled "sumario" have been merged.
Such merges probably have a negligible effect on the results (unless it generates a disproportionate amount of the rows with very large negative lags) but IMHO an easy gain might be to avoid merging more than N rows together, say 10 or 50. (Cf. https://github.com/dissemin/dissemin/issues/512#issuecomment-441985143 .)
The dataset has one row which begins with an array of about 600 "ua", "kr"; the DOI provided is https://doi.org/10.5380/acd.v18i2.54000 so it might be that hundreds of items titled "sumario" have been merged.
Such merges probably have a negligible effect on the results (unless it generates a disproportionate amount of the rows with very large negative lags) but IMHO an easy gain might be to avoid merging more than N rows together, say 10 or 50. (Cf. https://github.com/dissemin/dissemin/issues/512#issuecomment-441985143 .)