GRAPL roadmap - Githubissues

max-little / GRAPL

GRAPL: A computational library for nonparametric structural causal modelling, analysis and inference

GNU General Public License v3.0

78 stars 16 forks source link

GRAPL roadmap #18

Open AlxndrMlk opened 1 year ago

AlxndrMlk commented 1 year ago

Hi @max-little

Do we have a roadmap for the package?

I believe that a documentation page with some additional educational materials and literature references would be a great next step towards broadening the adoption of GRAPL in the community.

What are your thoughts on this?

Additionally, are there any algorithms that you have thought about adding to the next release?

Alex

max-little commented 1 year ago

Hi @AlxndrMlk

Thanks for raising this!

Yes, agreed, would be valuable to have some educational materials. I suppose, structural causal modelling and inference is not a "mainstream" topic in e.g. data science so I can imagine that the materials would have to cover some of the basics of the topic itself, with worked examples using GRAPL? Let me know what you think, and we can brainstorm a design.

Re: new algorithms for an updated release, there's a few obvious ones which have yet to be implemented, which would definitely be worth adding to the list:

m-connection/separation (mconn, closely following the function dconn in admg.py)
test for m-separation (ismsep, closely following the function isdsep in admg.py)
latent projection (latproj in admg.py)

For definitions, see Nested Markov Properties for Acyclic Directed Mixed Graphs. Let me know if you need advice on how to implement these.

Best Max

AlxndrMlk commented 1 year ago

Hey @max-little

Perhaps we could start with a documentation page, e.g. using Read The Docs (https://readthedocs.org/)

An example docs page: https://pandas-datareader.readthedocs.io/en/latest/

What are your thoughts?

max-little commented 1 year ago

Thanks for the suggestion @AlxndrMlk .

I think improved documentation is definitely valuable, but Read The Docs is far too heavyweight for such a small codebase (as yet). I know this system provides good templates but without a lot of work it just ends up being full of distracting boilerplate. I for one simply don't have the time to maintain something on that scale.

For now, I think a better approach is a few, carefully targeted notebooks. What is needed here, in my opinion, are short, brief tutorial notebooks on the basics of nonparametric causal inference (CI), showing how GRAPL can be used to learn about the subject by assisting reasoning through computational experiments.

Given that, the main issue would be designing the tutorials, how about the following scheme:

DAGs - what are they, what are they for, GRAPL language representation
Determining node relationships in DAGs (e.g. parent, child, ancestors, derived/extended relationships etc.)
Nonparametric distributions - relationship to DAGs
Manipulating DAGs (e.g.subgraphs, do-interventions)
Basic causal inference in DAGs (e.g. admissable sets, how to find these with GRAPL)

At least to as a start. Call it "Chapter 1"?

Max

AlxndrMlk commented 9 months ago

Hi @max-little

Thank you for sharing the ideas.

I understand your concern regarding readthedocs.

I am wondering what would bring the most value for the users.

I believe that the very basics of graphical models and causal inference are already covered elsewhere (e.g. Brady Neal's YouTube series or my book)

Describing GRAPL representations I think would be very helpful.

And perhaps describing the already implemented algorithms at least at high level.

What are your thoughts?

Alex

max-little commented 9 months ago

Hi @AlxndrMlk

It's worth emphasising that all the algorithms implemented here are described in the relevant literature, for instance, the Tian factorization for ADMGs is described in Tian's papers on the topic. So, the user could simply read these papers. However, it's possible that the descriptions in these papers are a fairly sophisticated technical level and could be made more accessible. In my opinion this would be best implemented as tutorial-style notebooks, as they are inherently interactive.

Best Max