kmpoon / hlta

Provides functions for hierarchical latent tree analysis on text data for hierarchical topic detection
GNU General Public License v3.0
81 stars 23 forks source link

Issue running PEM #1

Closed jcapde closed 7 years ago

jcapde commented 7 years ago

Hi,

Thanks for sharing the code for HLTA.

I'm trying to run PEM in the NIPS dataset, but I'm constantly getting this error in the clique tree propagation. The attribute _cells of Function returned by functions.get(var).project(var, value) seems to be empty for some runs.

    at java.lang.System.arraycopy(Native Method)
    at org.latlab.util.Function2D.project(Function2D.java:236)
    at org.latlab.reasoner.CliqueTreePropagation.absorbEvidence(CliqueTreePropagation.java:168)
    at org.latlab.reasoner.CliqueTreePropagation.propagate(CliqueTreePropagation.java:653)
    at org.latlab.learner.ParallelEmLearner$ForkComputation.computeDirectly(ParallelEmLearner.java:312)

Thanks

jcapde commented 7 years ago

Hi,

I think I've found which is the problem.

When I create the data files from text, I was running:

java -cp HLTA.jar:HLTA-deps.jar tm.text.Convert data/sample 1000 1 data/extracted/

Unfortunately, this creates a sample.txt file, whose content contains a line such as Name: data/sample. When loading the dataset from this file, I believe data/sample is tokenised, causing the creation of an extra variable. This variable is the one that was causing the above error.

For the time being, I'll be manually modifying the line Name: data/sample in the sample.txt file, so it avoids the creation of a new variable when parsing

Thanks, Joan

kmpoon commented 7 years ago

Thanks for your report.

I should have fixed the issue in version 1.4.1. The generated data file should not contain any non-alphanum characters for the file name. However, I didn't really check it.