mckinsey / causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.
http://causalnex.readthedocs.io/
Other
2.24k stars 258 forks source link

Can Causalnex support word embedding,and could it be useful? #53

Closed ziyuwzf closed 3 years ago

ziyuwzf commented 4 years ago

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

qbphilip commented 4 years ago

Hello and thanks for your question.

Would you want to find the relations between embedding "variables" or between documents with different embeddings? The first should be possible running the embedding as a pre-processing step. The latter means that you have multiple variables for a statistical "entity". Its a similar issue to supporting categorical and not trivial as the DAG constraint should not take the relationship of embeddings within a document into account.

We are working on a pytorch implementation (for structure learning) that should make contributions easier. However, I would not know what a do-intervention would look like on word embeddings?

ziyuwzf commented 4 years ago

Hello and thanks for your question.

Would you want to find the relations between embedding "variables" or between documents with different embeddings? The first should be possible running the embedding as a pre-processing step. The latter means that you have multiple variables for a statistical "entity". Its a similar issue to supporting categorical and not trivial as the DAG constraint should not take the relationship of embeddings within a document into account.

We are working on a pytorch implementation (for structure learning) that should make contributions easier. However, I would not know what a do-intervention would look like on word embeddings?

i see.thanks

oentaryorj commented 3 years ago

Pytorch implementation is now available at: https://github.com/quantumblacklabs/causalnex/tree/develop/causalnex/structure/pytorch

As discussed above, a pre-processing step can be done to identify the relationships among embedding variables. For simplicity reasons, the implementation of word embedding within CausalNex would be out of scope at this point. This can be done via PyTorch itself, as per https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html