markgw / pimlico

The Pimlico Processing Toolkit
http://pimlico.readthedocs.org/
GNU Lesser General Public License v3.0
6 stars 1 forks source link

Valid docs count should be updated by all modules, like length count #14

Open markgw opened 6 years ago

markgw commented 6 years ago

Commit 264c71cd886f59b96323db260079574420d2c9ef added a valid docs count to the metadata for some input readers.

This is useful for a lot of things, but this feature is not complete. The count cannot be relied upon, except when it's come straight from the input reader. Further down the pipeline, there might be more invalid documents, but the count is not currently updated. All internal modules should update (or at least remove) the count, just as they do currently with the length count.