usnistgov / nestor-tmp2

Quantifying tacit knowledge for investigatory analysis
Other
9 stars 5 forks source link

Distribution over MWO's plot not loading #56

Closed ghost closed 5 years ago

ghost commented 5 years ago

After I upgraded to v0.3 as per #54 I am able to update tag extraction and not experience the error I had before, however the "Distribution over MWO's" plot does not generate any data. I receive the following Warnings and the plot is blank as per the image below.

To reproduce,

saved locally! NA 1739 S 15 I 11 P 4 P I 2 S I 2 X 2 Name: NE, dtype: int64 I 5 S 3 P I 2 X 1 S I 1 P 1 NA 1 Name: NE, dtype: int64 SAVE IN PROCESS --> calculating the extracted tags and statistics... ONE GRAMS... TWO GRAMS... Tag completeness: 1.00 +/- 0.00 Complete Docs: 64, or 1.17% Empty Docs: 2108, or 38.43% Docs have at most 6 tokens (90th percentile) c:\anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[se q]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result. return np.add.reduce(sorted[indexer] weights, axis=axis) / sumval c:\anaconda3\lib\site-packages\statsmodels\nonparametric\kde.py:488: RuntimeWarning: invalid value encountered in true_divide binned = fast_linbin(X, a, b, gridsize) / (delta nobs) c:\anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars FAC1 = 2(np.pibw/RANGE)2 c:\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce return ufunc.reduce(obj, axis, dtype, out, passkwargs) SAVE --> your information has been saved, you can now extract your result in CSV or HDF5

image

ghost commented 5 years ago

I continued tagging more items and the plot now updates for me.

image

rtbs-dev commented 5 years ago

Fairly certain this is an issue with seaborn (which is the underlying plotting styler for that chart). It doesn't like plotting KDE's with "very few" data points.

We will probably add a notice in the documentation going forward that the progress distribution requires sufficient tags to work. (i.e. keep tagging).

rtbs-dev commented 5 years ago

Should be fixed with 35b11c9 on master. Can you update and report back?

msngit commented 5 years ago

Checked with a dataset with only 3 tags and 12 words tagged, out of 1700, and the plot is displayed. Will wait for @etbelski to confirm before closing the issue.

ghost commented 5 years ago

I'll have to try with the latest version, however I reran with the v0.3 and am no longer running into the issue. And it is with fewer tags (5 words tagged) than when I ran into this issue (146 words tagged). image

I'd call this closed for now and I'll reopen if I run into this issue in the newest version

rtbs-dev commented 5 years ago

Ok, since the latest version has no KDE, and the old behavior was only caused by low tagging, I'm going to go ahead and close this. If it happens again, I'll be happy to reopen.