hestiaAI / hestialabs-experiences

HestiaLabs Data Experiences & Digipower Academy
https://digipower.academy
Other
7 stars 1 forks source link

high-level representation of experiences #863

Open emmanuel-hestia opened 2 years ago

emmanuel-hestia commented 2 years ago

Hello everybody,

I had a serious chat with Paul-Olivier on the higher-level representation of the experiences.

The aim of the entire pipeline is to deliver analysis and graphical representations of a variety of data. The data varies in the sense that it could come from multiple sources, different variants of the same source, span across multiple participants etc. Argonodes provides us with a dynamic parsing and accessing system, while François' and Thomas' systems have givens us tools to visualise the data.

However, because we have face a number of emergencies, and also because I tend to lose myself in abstract considerations that cannot easily connect to the actual processing pipeline, the pipeline tends to drift towards ad hoc solutions. While this allows Hestia to answer specific questions within limited time spans, I think it is unsustainable on the long term given the multiplicity of potential sources and things to be made with them. Furthermore, it puts an unreasonable burden on the shoulders of the development team.

To remedy this, I think we would need a clearer separation between the technical side of the pipeline on the one side (parsing and processing data, as well as outputting results), and the specifics of the various experiences on the other. Basically, experiences should be described in a high-level language abstracting the technical specifics much like a Wiki document minimises web design considerations from the user to allow them quicker prototyping.

I would very much like it if I could take a bit of your time to set up a complete pipeline, from start to finish, starting with as simple data as necessary, that would utilise such an abstraction layer. I believe that once this starts to work, it can be quickly extended to the full range of our features, and maintained to encompass new ones in the future. This would not only accelerate the development of experiences, but most of all would remove these tasks from the development team and empower others, possibly even beyond Hestia itself.

andreaskundig commented 2 years ago

I would very much like it if I could take a bit of your time to set up a complete pipeline, from start to finish, starting with as simple data as necessary, that would utilise such an abstraction layer

I think you're talking about an abstraction layer that doesn't exist yet? I imagine the pipeline would be defined in a language that we first need to specify, and then to implement.

andreaskundig commented 2 years ago

We also need to know how this fits into priorities. This is super interesting, and I bet many of us would love to get started on it. What I usually see happening is that our efforts in this direction tend to start, and stop, and resume in different incompatible ways.

emmanuel-hestia commented 2 years ago

Thank you Andreas for you kind words on the subject.

as to the language: the language would ideally come down to a configuration file, really. I think I could imagine it being implemented as a JSON file, or even CSV. For maximum flexibility we could maybe imagine having a parallel Jupyter Notebook interface. In any case it would reuse as much existing technology as possible, of course.

as to priorities: I really think this is the way to go (indeed I should have been more proactive to push in this direction for quite some time), and at a minimum it should not be delayed indefinitely. It should of course not hinder the most urgent work, but I believe it will be an investment that will make ulterior work much quicker, and indeed move a good chunk of it to myself and other members of Hestia. I defer to @pdehaye as to what is would entail in practice for our agenda.

Amustache commented 2 years ago

If anything, firstly because I love all things modular, and secondly because I think taking a little time to plan is always good, I agree with what you describe!

This is also a discussion we had in parallel with @pdehaye, and I think it's going to bear fruit in the future.

andreaskundig commented 2 years ago

The configuration file will surely be a JSON, but we need to specify tables and queries and whatever else we need in this JSON, and the notation for that is what I meant by a language.

For the priority, you are right to defer to @pdehaye, but we should also include it in the planning process, which I think means making sure that @alexbfree coordinates this and pushes and clarifies that this is indeed a long-term priority that we commit to. It's likely to be a serious amount of work.

pdehaye commented 2 years ago

I think we are all aligned on agreeing to this vision, the tricky bits is how to invest our time wisely moving in this direction.

Many of us would agree that the engineering bottleneck now is with @fquellec , because he can only work 48 hours per day (this sentence obviously said with bienveillance, and appreciative of the formidable efforts he puts in).

So let's solve two problems with one stone: how can this "semanticization" work be most helpful immediately in order to relieve Francois?

My conversation with @emmanuel-hestia was followed by one with @Amustache , which has helped provide some clarity on that question.

Right now @Amustache, @emmanuel-hestia and @fquellec are working on a "concept explorer". This is a good tool to precisely helps us all discuss the results of our semantics work: can we present to each other, to Jessica, to an Uber driver, etc the concepts in a coherent way upfront, before building nifty visualizations? It would use as input the results of the Argonodes work with @emmanuel-hestia and @Amustache.

However in Zoom-ing with @Amustache he couldn't show me the final result end-to-end to me: the Argonode output serves as input to Javascript bits that he doesn't master. He is still dependent on others for deployment and making his work more largely useful.

This becomes the first priority to fix: everyone should be able to at least evoke/demo the end-to-end value of their work, where the value is judged by the what it actually puts in the hands of an end-user. Here, @Amustache and @emmanuel-hestia share a common problem.

I can see two ways to do it:

I like the first one better, but need before deciding some input from anyone who can confidently assess how hard it actually is.

andreaskundig commented 2 years ago

So in the pipeline described by @emmanuel-hestia, we focus on the first step, the concept explorer, and not on additional things needed to specify the rest of the pipeline. We already have the notation for that step (thanks to @Amustache ), and we'll soon have a first implementation in experiences. What we now want is to extract it from experiences to a separate tool. This sounds like extracting the concept explorer into a separate js module, and then either write a ipython wrapper for it, or write another vue app around it. My first impression is that the ipython solution would be simpler, and probably more useful?

It would be great if we could discuss this in person on monday. (at the same time it's obvious now that this is well underway and that I'm not that much involved)

fquellec commented 2 years ago

I see two main subjects here:

pdehaye commented 2 years ago

Thank for your input, @fquellec. Your breakdown makes sense, but brings specific response to each.

alexbfree commented 2 years ago

This is old - pass back to Bizdev to review and decide if there is a current Ask for dev.