typedb / typedb-ml

TypeDB-ML is the Machine Learning integrations library for TypeDB
https://vaticle.com
Apache License 2.0
550 stars 93 forks source link

Potential bug or unwanted behaviour of kglib categorical attribute range\embedder function #140

Open kubpie opened 4 years ago

kubpie commented 4 years ago

Description

The issue that most probably originates in categorical attribute embedder of kglib library. I have defined a categorical variable that can take 3 values: 'duct_type': ["NotDuct","SLD","DC"]. However in some examples, both 'SLD' & 'DC' will be present in the graph. For these examples grakn produces an error like the one below. It does not show up for any other case or combination of values. After removing this variable from the list of variables and the query, the pipeline runs fine. It seems that the combination of two categorical labels is being treated as a unique value not in the specified range.

InvalidArgumentError: indices[0,0] = 5 is not in [0, 3)
     [[node KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embedding_lookup (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\sonnet\python\modules\embed.py:182) ]]

Errors may have originated from an input operation.
Input Source operations connected to node KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embedding_lookup:
 KGCN_1/kg_encoder/node_model/sequential/ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/Cast (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\kglib\kgcn\models\attribute.py:56)    
 ThingEmbedder/typewise_encoder/duct_type_cat_embedder_1/embed/embeddings/read (defined at C:\Users\kubap\Anaconda3\envs\grakn-16\lib\site-packages\sonnet\python\modules\util.py:963)

Environment

OS (where Grakn server runs):Windows 10 Grakn version (and platform): Grakn Core 1.6.2 Grakn client: Python Client 1.6.1, Other environment details: Workbase 1.2.7, grakn-kglib 0.2.1 It's been pointed out to me that you don't officially support Anaconda installations, but the same error is produces in VS Code, and all packages in my conda env were installed through pip.

Reproducible Steps

My kglib project is available at: https://github.com/Qbbz/SSP with runtime instructions. Due to limited amount of time unfortunately I can't produce an exact example now, but I'm available to help you with that in the future.

Expected Output

Multiple categorical labels are treated separately and assigned an integer value within defined range OR the range needs to be defined in terms of possible combinations too.

Actual Output

The training wouldn't start due to error above: learn.py crashes at create_feed_dicts.

flyingsilverfin commented 4 years ago

@jmsfltchr i think this should live in the kglib repo right?

jmsfltchr commented 4 years ago

Yes that's right! @Qbbz could you copy this issue over to graknlabs/kglib please?

flyingsilverfin commented 4 years ago

Actually we can do it with the "transfer issue" these days :D on it

kubpie commented 4 years ago

pardon, i didn't realise that i'm posting to the wrong branch. I'm happy it's solved now!