If some of the documents are empty (empty line in input file), the output in the corresponding pz_d file is all -nan. Of course this is a border case which can be easily dealth with by removing such "documents".
The empty documents can arise for short documents composed only of stopwords. After stopword removal the document is empty.
Hi,
If some of the documents are empty (empty line in input file), the output in the corresponding pz_d file is all -nan. Of course this is a border case which can be easily dealth with by removing such "documents".
The empty documents can arise for short documents composed only of stopwords. After stopword removal the document is empty.
Thanks for writing the code and sharing it.