Closed hans-ekbrand closed 3 years ago
deduplicate_labels()
is not intended to a applied on importer objects. I suggest you use the function only after loading the data usint subset()
ord as.data.set()
. Anyway, your website seems to be down at the moment, which precludes me from reproducing and debugging the problem. Could you make the data available to me?
Thanks for your rapid response! Your advice was spot on.
Thanks for closing the issue with duplicate labels. It works, but unfortunately it is very slow on large data sets. The time spent on importing data is about 100 times longer with
deduplicate_labels()
than without. My guess is that the implementation could be improved.Beware that the test file is big: 1.7 GB, and that it will take almost 4 hours to run
deduplicate_labels()
on it.Importing a subset of the file without running
deduplicate_labels()
takes only a few minutes.Is there a way to speed up
deduplicate_labels()
?