Open AADeLucia opened 4 years ago
Good question. Adding a vocabulary builder step that doesn't write instance files might make pruning easier for very large data sets. Not allowing regexes is a big part of what made bulk-loader fast, but this may have changed. For stopwords you can always start with the default English list and add to that for bulk-load.
There are features that are available in bulk-load that are not in import-file and vice versa:
I find these features very handy. Are there any plans to combine some of the features?