qwaider / heideltime

Automatically exported from code.google.com/p/heideltime
0 stars 0 forks source link

StanfordPOSTaggerWrapper model path #26

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

The initialize method of StanfordPOSTaggerWrapper class tests whether the file 
denoted by model_path exists, and then attempts to instantiate a MaxentTagger 
object with it. The Javadoc for the MaxentTagger constructor says that the 
modelFile parameter can be interpreted as a URL if it starts with "https?://" 
or can be loaded directly from the classpath as in 
"com/example/models/model.tagger".

I put my model file in my project's classpath (and I configured my config.props 
according to this resource path). Heidel Time fails because of the check of the 
pathname's existence. If I remove this check, it works like a charm.

It would be nice to reflect the MaxentTagger specification in 
StanfordPOSTaggerWrapper. I think StanfordPOSTaggerWrapper should only check 
that model_path is not null, and should leave the responsibility of the other 
checks to MaxentTagger. What do you think?

Thank you for this great library and the work you have done so far!

Original issue reported on code.google.com by pascalgi...@gmail.com on 17 Jan 2015 at 1:56

GoogleCodeExporter commented 9 years ago
Hey,

thanks for the report. I suppose your suggestion makes sense.

That check was put in to catch incorrect configurations, and we've never dealt 
with systems where you put remote files into the tagger, or aren't able to 
specify absolute paths in the props file. I also think that in part, it's a 
silly way of doing things (i'm looking at you, 
"IOUtils.getInputStreamFromURLOrClasspathOrFileSystem"), but for the sake of 
compatibility, I've pushed r587a9ce12923.

If you find that this is indeed the correct fix for your use case, please let 
me know. The change should find its way into the next major release then.

Original comment by z...@informatik.uni-heidelberg.de on 17 Jan 2015 at 3:59

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
That is exactly the correct fix for my use case. I had already checked it by 
overriding the class StanfordPOSTaggerWrapper with the same modification as in  
r587a9ce12923 .

Thank you for the quick return.

Original comment by pascalgi...@gmail.com on 17 Jan 2015 at 6:47

GoogleCodeExporter commented 9 years ago
that's good news, thanks!

Original comment by z...@informatik.uni-heidelberg.de on 18 Jan 2015 at 3:22