Closed Yan-LCAS closed 1 year ago
First, we strongly discourage the use of Porter stemmers. It's almost certainly not doing what you expect.
The --use-pipe-from
flag expects the argument to be a serialized sequences file, so that you can repeat a complete import process. It doesn't add a specific pipe. You're giving it a compiled class. You would need to make a copy of CsvToVectors and add the stemmer class to the pipe sequence.
Thanks
This is my command: mallet import-file --input myfile.mallet --output myfile-stemming.mallet --token-regex '[\p{L}\p{M}]+' --keep-sequence --use-pipe-from class\cc\mallet\pipe\TokenSequence2PorterStems.class
This is the reply: java.io.StreamCorruptedException: invalid stream header: CAFEBABE at java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:987) at java.base/java.io.ObjectInputStream.(ObjectInputStream.java:414)
at cc.mallet.types.InstanceList.load(InstanceList.java:821)
at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:146)
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't read InstanceList from file class\cc\mallet\pipe\TokenSequence2PorterStems.class
at cc.mallet.types.InstanceList.load(InstanceList.java:830)
at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:146)
How can it be fixed?