psal / jstylo

JStylo-Anonymouth - Authorship Attribution and Authorship Anonymization Framework
Other
177 stars 42 forks source link

Build fails (missing directory, failed unit tests), analysis fails (tmp.xml access denied) #12

Open bwegge opened 8 years ago

bwegge commented 8 years ago

Hi all,

I am trying to build the current JStylo version from the git repo (readme says it's 2.9.0) since I could not find any recent version to download (the last binary package I found was for v1.2.0, which seems pretty old). I am on Windows 10, 64 bit.

I could not find any build instructions, so I just imported it as a maven project in Eclipse Mars (64 bit) and ran maven with goal=package. However, there seems to be a bunch of issues:

Firstly, Eclipse complains about a missing directory "src/test/resources". I simply re-created this one, in hope that there are no missing files, too, in this directory. Secondly, after starting the build with the re-created directory, it runs for a while, until at the end it complains about 3 failed test cases:

Failed tests:   toString_Success(edu.drexel.psal.jstylo.generics.test.DocResultTest): expected:<...uspect is A with a 0[.]70 likelihood(..)
  getStatisticsString_Success(edu.drexel.psal.jstylo.generics.test.ExperimentResultsTest): expected:<...itive Percentage: 50[.]00(..)
  getAllDocumentResultsVerbose_Success(edu.drexel.psal.jstylo.generics.test.ExperimentResultsTest): expected:<...itle1         |    0[.10    |    0.20    |  >>0.70<<  |(..)

In lack of any better knowledge, I just commented out the 3 affected @Test methods, and now the build successfully creates two jar files in the target directory: jstylo-2.9.0.jar and jstylo-2.9.0-jar-with-dependencies.jar. Great I thought, so I copied them to the project's root directory and tried to run them as stated in the README (replacing "jsan.jar" with one of the generated jars' actual filenames).

However, both did not work. I guess I need to use the version with dependencies, but it seems that maven fails to include the separate library jgaap-5.2.0-lite.jar in the final jstylo-2.9.0-jar-with-dependencies.jar. Again due to lack of skills with maven or Java in general, I figured out that I can include the forgotten library to the classpath by running java -cp jstylo-2.9.0-jar-with-dependencies.jar;..\lib\jgaap-5.2.0-lite.jar edu.drexel.psal.jstylo.GUI.GUIMain instead of the documented java -jar jsan.jar command. Now it really starts up the JStylo GUI.

So I loaded one of the included projects (amt_obfuscation.xml), chose the "WritePrints (Limited)" feature set, picked the "SMO" classifier (because that was one of the few classifiers that actually worked in v1.2.0 without giving me an obscure error "Failed to build the statistics string!") but no verifier, set "Classification Type" to "Train and classify on documents with known authors", "Analysis Type" to "Classify", and Verification Type to "Verify unknown text documents". All other options were left as-is. So I started the analysis, which seemed to work, until it failed with a message box saying "Could not create instances from training corpus: tmp.xml (Access is denied) Aborting analysis." The console window had this corresponding error:

ERROR AnalysisTabDriver - Could not create instances from training corpus!
java.io.FileNotFoundException: tmp.xml (Access is denied)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(Unknown Source)
        at java.io.FileOutputStream.<init>(Unknown Source)
        at java.io.FileOutputStream.<init>(Unknown Source)
        at java.io.PrintWriter.<init>(Unknown Source)
        at edu.drexel.psal.jstylo.featureProcessing.CumulativeFeatureDriver.<init>(CumulativeFeatureDriver.java:81)
        at edu.drexel.psal.jstylo.featureProcessing.LocalParallelFeatureExtractionAPI.createTrainingDataMapThreaded(LocalParallelFeatureExtractionAPI.java:251)
        at edu.drexel.psal.jstylo.GUI.AnalysisTabDriver$RunAnalysisThread.run(AnalysisTabDriver.java:789)
        at java.lang.Thread.run(Unknown Source)

No matter what project I loaded or classifier I chose, it always ended up with some error.

Could you please advise what I might have done wrong, or how to get the last working version running?

Thanks, Ben

travis-crow commented 8 years ago

Good morning! Thanks for all of the details. Unfortunately, no one is actively developing this project at this time and I am on other projects in other positions, but I'll look into the issue if I have the time.

That said, I did add some basic build instructions to the readme. I use a maven install, cleaning before hand to ensure that there's no artifacts leftover. In the future, you can add the "-DskipTests" flag to skip all of the tests during the build process which should drastically decrease the build time.

As for the missing dependency, that too was missing from the readme. JStylo has a dependency on JGAAP which is our only dependency not mavenized, but it is also too large to be stored in a github repository. Currently, it builds using JGAAP's jar file as a system jar. I added the following note to the README to reflect this:

NOTE: JStylo depends on JGAAP, which is not hosted on the maven central repository. Additionally, due to github's policy on large files, it cannot be included with the project. JStylo branch 2.3.0, an older version from before this github restriction, has this dependency included. To build this version, please download "jgaap-5.2.0-lite.jar" from branch 2.3.0 and place it in /lib subdirectory of the project.

Now to address your UI concerns.

JStylo 2.9.0 was produced as a result of a temporary effort to improve the JStylo API/backend. While we did try to avoid breaking the GUI when possible, we did not do extensive testing on it. If you are planning on using the JStylo GUI, I recommend that you instead use branch 2.3.0. It is missing several major updates, but has a stable UI. Version 2.3.0 can be run programmatically by running GUIMain.java from your IDE. As far as I'm aware, there is no pre-compiled version of 2.3.0 available. It is not too different from the 1.2 version (which is indeed a few years old) in terms of the GUI, though it does have extra backend bug fixes and updates.

I have an idea regarding the GUI issue you are seeing in 2.9.0 and will look into it. I'll post again in this issue once I have duplicated the problem and have a potential fix implemented.

bwegge commented 8 years ago

Hi, thanks alot for the quick reply and hints! As you indicated that v2.9.0 was about improving the API/backend, I guess you used the version without the GUI yourself. Is there documentation how to use the backend standalone? This would probably also help me greatly, since I get frequent errors about "tmp.xml" even with v1.2 after running a couple of analyses in the same session and it gets cumbersome to restart and click through the GUI again and again. (Otherwise, I will also try v2.3 and see how it goes.)

Best, Ben

jdistler commented 8 years ago

I am also very interested in using the backend API for a project of mine. Is there any possibility for the creation of documentation on this?

lessless commented 6 years ago

Would love to find out too