idio / spotlight-model-editor

Tool for tweaking dbpedia spotlight's models
Apache License 2.0
16 stars 8 forks source link

Exception when trying to add new entities #10

Closed nickvosk closed 6 years ago

nickvosk commented 9 years ago

I am trying to add new entities using the command: sh target/bin/model-editor file-update all path/to/en/model path_to_file/with/model/changes

At some point I get some warnings :

Parsing csv-file
Warning...this SF won't be able to be matched to an Uppercase SF
    "1stdibs"
    Size:1
Warning...this SF won't be able to be matched to an Uppercase SF
    "be2"
    Size:1
Warning...this SF won't be able to be matched to an Uppercase SF
    "onefinestay"
    Size:1
Finished parsing csv-file

The procedure fails with the following exception :

Exception in thread "main" java.lang.UnsupportedOperationException: empty.min
    at scala.collection.TraversableOnce$class.min(TraversableOnce.scala:194)
    at scala.collection.mutable.ArrayOps.min(ArrayOps.scala:38)
    at org.idio.dbpedia.spotlight.stores.CustomSurfaceFormStore.addLowerCaseSurfaceForm(CustomSurfaceFormStore.scala:273)
    at org.idio.dbpedia.spotlight.stores.CustomSurfaceFormStore$$anonfun$addMapOfLowerCaseSurfaceForms$1.apply(CustomSurfaceFormStore.scala:288)
    at org.idio.dbpedia.spotlight.stores.CustomSurfaceFormStore$$anonfun$addMapOfLowerCaseSurfaceForms$1.apply(CustomSurfaceFormStore.scala:286)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:93)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:93)
    at scala.collection.Iterator$class.foreach(Iterator.scala:660)
    at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:43)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:93)
    at org.idio.dbpedia.spotlight.stores.CustomSurfaceFormStore.addMapOfLowerCaseSurfaceForms(CustomSurfaceFormStore.scala:286)
    at org.idio.dbpedia.spotlight.CustomSpotlightModel.addMapOfLowerCaseSurfaceForms(CustomSpotlightModel.scala:575)
    at org.idio.dbpedia.spotlight.utils.ModelUpdateFromFile.loadNewEntriesFromFile(ModelUpdateFromFile.scala:65)
    at org.idio.dbpedia.spotlight.Main$.runCommand(SpotlightModelReader.scala:187)
    at org.idio.dbpedia.spotlight.Main$.main(SpotlightModelReader.scala:40)
    at org.idio.dbpedia.spotlight.Main.main(SpotlightModelReader.scala)

Any idea why this happens? Does the exception have to do with the warnings about the SFs?

Thanks.

tgalery commented 9 years ago

I faced something similar a while ago. I think the code generates lowercase SFs on the basis of your input file. So if everything is lowercase, the .min() method returns an exception. I'm not sure whether we should add this to the docs, or write more defensive code. I'm in favour for the second, but @dav009 might disagree.

dav009 commented 9 years ago

Hello @nickvosk Are you trying to edit a Model for Spotlight 0.6 or Spotlight 0.7 ?

nickvosk commented 9 years ago

Hey @dav009. I'm trying to edit a Spotlight 0.7 model.

dav009 commented 9 years ago

Just want to make sure you were using this branch:

https://github.com/idio/spotlight-model-editor/tree/feature/code-clean-up-0-7

As master works with 0.6 models

tgalery commented 9 years ago

fyi, I think this might be present in the 0.6 code too.

dav009 commented 9 years ago

can you share a file generating this error?

dav009 commented 9 years ago

@nickvosk @tgalery :) It would be great if you could share some of the samples that generated this issue

nickvosk commented 9 years ago

hey, sorry for the delay @dav009 . two example lines :

1stdibs 1stdibs alibaba|beauti|object|earth|pham|dealer|furnitur|compani|karp|index|ross|angel|offer|fine|stdib|els|spark|bring|found|lead|laurenc|adam|startup|avail|richard|forcion|watch|sv|passion|commerc|york|websit|jewelri|cristina|special|david|amp|carmin|flea|inventori|exclus|rosenblatt|rare|unit|collect|ventur|bruno|benchmark|miller|capit|onlin|michael|pari|desir|paul|art|market|world|share|sourc|marketplac   1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|2|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|1|2|1|1|1|1|1|1|2|1|1|1
navabi  navabi  retail|look|boutiqu|edg|media|compani|season|custom|index|www|unpreced|offer|via|dumont|plus|design|access|label|germani|recommend|editori|seventur|video|startup|bauer|product|strive|commerc|savvi|websit|european|style|tv|malin|week|aachen|present|collect|ventur|partner|size|fashion|express|posern|onlin|updat|page|worldwid|cut|navabi|deliveri|premier|featur|uniqu|world 2|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|2|1|1|1|2|1|3|4|1|1|2|1|1|1|1|3|1|1|1|1|1
dav009 commented 9 years ago

@nickvosk thanks! :) This exceptions occurs because I made a quite obscure assumption. Given the way in which the Stores are associated I assume there is at least one Surface form in the provided list that will contain one upper case letter. That surface form is used to attach the lowercase surface forms as candidates.

dav009 commented 9 years ago

so if you add at least a upper case surface form, the exception should not take place:

1stdibs 1stdibs|1Stdibs alibaba|beauti|object|earth|pham|dealer|furnitur|compani|karp|index|ross|angel|offer|fine|stdib|els|spark|bring|found|lead|laurenc|adam|startup|avail|richard|forcion|watch|sv|passion|commerc|york|websit|jewelri|cristina|special|david|amp|carmin|flea|inventori|exclus|rosenblatt|rare|unit|collect|ventur|bruno|benchmark|miller|capit|onlin|michael|pari|desir|paul|art|market|world|share|sourc|marketplac   1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|2|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|1|2|1|1|1|1|1|1|2|1|1|1
navabi  navabi|Navabi retail|look|boutiqu|edg|media|compani|season|custom|index|www|unpreced|offer|via|dumont|plus|design|access|label|germani|recommend|editori|seventur|video|startup|bauer|product|strive|commerc|savvi|websit|european|style|tv|malin|week|aachen|present|collect|ventur|partner|size|fashion|express|posern|onlin|updat|page|worldwid|cut|navabi|deliveri|premier|featur|uniqu|world 2|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|2|1|1|1|2|1|3|4|1|1|2|1|1|1|1|3|1|1|1|1|1
tgalery commented 9 years ago

True dat, but should we not be more defensive in handling this ?

On Wed, Apr 15, 2015 at 3:30 PM, David Przybilla notifications@github.com wrote:

so if you add at least a upper case surface form, the exception should not take place:

1stdibs 1stdibs|1Stdibs alibaba|beauti|object|earth|pham|dealer|furnitur|compani|karp|index|ross|angel|offer|fine|stdib|els|spark|bring|found|lead|laurenc|adam|startup|avail|richard|forcion|watch|sv|passion|commerc|york|websit|jewelri|cristina|special|david|amp|carmin|flea|inventori|exclus|rosenblatt|rare|unit|collect|ventur|bruno|benchmark|miller|capit|onlin|michael|pari|desir|paul|art|market|world|share|sourc|marketplac 1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|2|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|2|1|1|1|2|1|1|1|1|1|1|2|1|1|1 navabi navabi|Navabi retail|look|boutiqu|edg|media|compani|season|custom|index|www|unpreced|offer|via|dumont|plus|design|access|label|germani|recommend|editori|seventur|video|startup|bauer|product|strive|commerc|savvi|websit|european|style|tv|malin|week|aachen|present|collect|ventur|partner|size|fashion|express|posern|onlin|updat|page|worldwid|cut|navabi|deliveri|premier|featur|uniqu|world 2|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|3|1|1|2|1|1|1|2|1|3|4|1|1|2|1|1|1|1|3|1|1|1|1|1

— Reply to this email directly or view it on GitHub https://github.com/idio/spotlight-model-editor/issues/10#issuecomment-93523246 .

dav009 commented 9 years ago

yeah, we should. mm so in case none upper case surface form is added, I will create an artificial one with its first letter in uppercase and add it to the main store?

tgalery commented 9 years ago

Maybe it should be based on the title of the dbpedia resource as opposed to arbitrarily ? Or else we should generate uppercase variants of all the lowecase ones. Any thoughts ?

On Wed, Apr 15, 2015 at 3:36 PM, David Przybilla notifications@github.com wrote:

yeah, we should. mm so in case none upper case surface form is added, I will create an artificial one with its first letter in uppercase and add it to the main store?

— Reply to this email directly or view it on GitHub https://github.com/idio/spotlight-model-editor/issues/10#issuecomment-93525082 .

dav009 commented 9 years ago

I would go for generating an upper case one based on the dbpedia resource Id.

mal commented 6 years ago

Closing as part of archiving process.