Closed Anupam750 closed 2 years ago
Do you mean Stanford.NLP.Segmenter
?
Yes Stanford.NLP.Segmenter, I will check it out and let you know for the same.
above code is giving compile time error as this code is in java so do we have any code in C#?
@Anupam750 not yet, sorry. But it should not be hard to convert Java sample to C# sample
I did the code in C# and that code is giving error and that is related to nlp dll
Cannot help you without sample code and stack trace. You can try C# sample from the same SO answer
public class NlpDemo
{
public static readonly TokenizerFactory TokenizerFactory = PTBTokenizer.factory(new CoreLabelTokenFactory(),
"normalizeParentheses=false,normalizeOtherBrackets=false,invertible=true");
public void ParseFile(string fileName)
{
using (var stream = File.OpenRead(fileName))
{
SplitSentences(stream);
}
}
public void SplitSentences(Stream stream)
{
var preProcessor = new DocumentPreprocessor(new UTF8Reader(new InputStreamWrapper(stream)));
preProcessor.setTokenizerFactory(TokenizerFactory);
foreach (java.util.List sentence in preProcessor)
{
ProcessSentence(sentence);
}
}
// print the sentence with original spaces and punctuation.
public void ProcessSentence(java.util.List sentence)
{
System.Console.WriteLine(edu.stanford.nlp.util.StringUtils.joinWithOriginalWhiteSpace(sentence));
}
}
or as alternative use CoreNLP package and extract list of sentences from annotation https://github.com/sergey-tihon/Stanford.NLP.NET/blob/master/samples/Stanford.NLP.CoreNLP.CSharp/StanfordCoreNlpClient.cs#L35-L36
but I have no idea how large your text is.
ok, I will try it out and let you know for the same.
FYI my text is around 300 pages pdf file
Hello everyone,
If this issue is resolved could you please update here the solution and explain how you resolved it ?
On Wed, Oct 24, 2018 at 11:10 PM Anupam750 notifications@github.com wrote:
my text is around 300 pages pdf file
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sergey-tihon/Stanford.NLP.NET/issues/90#issuecomment-432758254, or mute the thread https://github.com/notifications/unsubscribe-auth/AMNNXhOXfZQ2ilgeIQ_S8Z-pcUywfwgoks5uoKYjgaJpZM4X0_U7 .
Close as an old issue
Stanford NLP stuck when using large text for segmentor. Please check it.