high-level representation of experiences

hestiaAI / hestialabs-experiences

HestiaLabs Data Experiences & Digipower Academy

https://digipower.academy

Other

7 stars 1 forks source link

high-level representation of experiences #863

Open emmanuel-hestia opened 2 years ago

emmanuel-hestia commented 2 years ago

Hello everybody,

I had a serious chat with Paul-Olivier on the higher-level representation of the experiences.

The aim of the entire pipeline is to deliver analysis and graphical representations of a variety of data. The data varies in the sense that it could come from multiple sources, different variants of the same source, span across multiple participants etc. Argonodes provides us with a dynamic parsing and accessing system, while François' and Thomas' systems have givens us tools to visualise the data.

However, because we have face a number of emergencies, and also because I tend to lose myself in abstract considerations that cannot easily connect to the actual processing pipeline, the pipeline tends to drift towards ad hoc solutions. While this allows Hestia to answer specific questions within limited time spans, I think it is unsustainable on the long term given the multiplicity of potential sources and things to be made with them. Furthermore, it puts an unreasonable burden on the shoulders of the development team.

To remedy this, I think we would need a clearer separation between the technical side of the pipeline on the one side (parsing and processing data, as well as outputting results), and the specifics of the various experiences on the other. Basically, experiences should be described in a high-level language abstracting the technical specifics much like a Wiki document minimises web design considerations from the user to allow them quicker prototyping.

I would very much like it if I could take a bit of your time to set up a complete pipeline, from start to finish, starting with as simple data as necessary, that would utilise such an abstraction layer. I believe that once this starts to work, it can be quickly extended to the full range of our features, and maintained to encompass new ones in the future. This would not only accelerate the development of experiences, but most of all would remove these tasks from the development team and empower others, possibly even beyond Hestia itself.

andreaskundig commented 2 years ago

I would very much like it if I could take a bit of your time to set up a complete pipeline, from start to finish, starting with as simple data as necessary, that would utilise such an abstraction layer

I think you're talking about an abstraction layer that doesn't exist yet? I imagine the pipeline would be defined in a language that we first need to specify, and then to implement.

andreaskundig commented 2 years ago

We also need to know how this fits into priorities. This is super interesting, and I bet many of us would love to get started on it. What I usually see happening is that our efforts in this direction tend to start, and stop, and resume in different incompatible ways.

emmanuel-hestia commented 2 years ago

Thank you Andreas for you kind words on the subject.

as to the language: the language would ideally come down to a configuration file, really. I think I could imagine it being implemented as a JSON file, or even CSV. For maximum flexibility we could maybe imagine having a parallel Jupyter Notebook interface. In any case it would reuse as much existing technology as possible, of course.

as to priorities: I really think this is the way to go (indeed I should have been more proactive to push in this direction for quite some time), and at a minimum it should not be delayed indefinitely. It should of course not hinder the most urgent work, but I believe it will be an investment that will make ulterior work much quicker, and indeed move a good chunk of it to myself and other members of Hestia. I defer to @pdehaye as to what is would entail in practice for our agenda.

Amustache commented 2 years ago

If anything, firstly because I love all things modular, and secondly because I think taking a little time to plan is always good, I agree with what you describe!

This is also a discussion we had in parallel with @pdehaye, and I think it's going to bear fruit in the future.

andreaskundig commented 2 years ago

The configuration file will surely be a JSON, but we need to specify tables and queries and whatever else we need in this JSON, and the notation for that is what I meant by a language.

For the priority, you are right to defer to @pdehaye, but we should also include it in the planning process, which I think means making sure that @alexbfree coordinates this and pushes and clarifies that this is indeed a long-term priority that we commit to. It's likely to be a serious amount of work.

pdehaye commented 2 years ago

I think we are all aligned on agreeing to this vision, the tricky bits is how to invest our time wisely moving in this direction.

Many of us would agree that the engineering bottleneck now is with @fquellec , because he can only work 48 hours per day (this sentence obviously said with bienveillance, and appreciative of the formidable efforts he puts in).

So let's solve two problems with one stone: how can this "semanticization" work be most helpful immediately in order to relieve Francois?

My conversation with @emmanuel-hestia was followed by one with @Amustache , which has helped provide some clarity on that question.

Right now @Amustache, @emmanuel-hestia and @fquellec are working on a "concept explorer". This is a good tool to precisely helps us all discuss the results of our semantics work: can we present to each other, to Jessica, to an Uber driver, etc the concepts in a coherent way upfront, before building nifty visualizations? It would use as input the results of the Argonodes work with @emmanuel-hestia and @Amustache.

However in Zoom-ing with @Amustache he couldn't show me the final result end-to-end to me: the Argonode output serves as input to Javascript bits that he doesn't master. He is still dependent on others for deployment and making his work more largely useful.

This becomes the first priority to fix: everyone should be able to at least evoke/demo the end-to-end value of their work, where the value is judged by the what it actually puts in the hands of an end-user. Here, @Amustache and @emmanuel-hestia share a common problem.

I can see two ways to do it:

embed experiences into Jupyter: Jupyter can run javascript cells, and one cell could be "import the Concept Explorer and run it on the file ´X´ with configuration denoted by variable name ´Y´ produced in the previous cell"
deploy (alongside the timeline viewer and the date viewer) a concept viewer that has the possibility not only to upload a file but also copy/paste a configuration.

I like the first one better, but need before deciding some input from anyone who can confidently assess how hard it actually is.

andreaskundig commented 2 years ago

So in the pipeline described by @emmanuel-hestia, we focus on the first step, the concept explorer, and not on additional things needed to specify the rest of the pipeline. We already have the notation for that step (thanks to @Amustache ), and we'll soon have a first implementation in experiences. What we now want is to extract it from experiences to a separate tool. This sounds like extracting the concept explorer into a separate js module, and then either write a ipython wrapper for it, or write another vue app around it. My first impression is that the ipython solution would be simpler, and probably more useful?

It would be great if we could discuss this in person on monday. (at the same time it's obvious now that this is well underway and that I'm not that much involved)

fquellec commented 2 years ago

I see two main subjects here:

The first is how to offload the developers NOW, when we want to create new experiences. In our current setup, I think an easy solution might be to have some sort of guidelines on what we need to specify when we want a new experience. Currently, when someone asks for a new TikTok experience for example, the developer has to dig through the data, find interesting things, think about how to display them, choose the colours, write the appropriate text and then and only then deal with the coding/configuration and SQL queries. Believe me, this thought process takes time and can be done by anyone at Hestia (I love doing it personally, I'm just saying it takes time). So a first solution would be to prepare the work for the devs by following some sort of template guidelines in a higher level.
The second topic is about how to create/explore data more easily and offload developers IN THE FUTURE. The way I see it is that we are gradually building new features like the file/path/concept explorer, allowing us and others to explore data in different ways. Once we have these, it will be fairly easy to extract informative tables from any data source. From these tables, we can then let the user view them as charts with minimal configuration (I've already started a PR for this, a button to view any table as a chart, but it's far from finished). And once we have charts, we can export the configuration needed to achieve this from the raw source. That's how I think we're getting on. Now, if/when we have all this, I don't understand the need to integrate the experiences into Jupyter notebooks or Python, I just can't find a concrete example where this is useful, as well as potentially being a nightmare to implement (in fact, I have no idea). To create a new experiences, the pipeline would be as follows: run the @Amustache tool from a web interface and download the model. Use the model and raw data in the generic experience explorer, browse the archive and easily identify some concepts you want to visualise, choose a graph to represent these concepts/tables and export the configuration.

pdehaye commented 2 years ago

Thank for your input, @fquellec. Your breakdown makes sense, but brings specific response to each.

for the NOW part, I don't think everyone can do it at hestia, in great part because it requires mastery of tools that allow navigating complex files. In addition the focus is on offloading you. Please stop doing stuff you think someone else can do. You are mission critical for many things, ask others or come to me in such situations.
I agree with a plan for IN THE FUTURE, around path explorer and the like. However, what you suggest even here is still dependent on you (and select others). After these manipulations with the webtool etc, we are still dependent on you to deploy. This will always be the case, but unless I misunderstood something we are still dependent on you to deploy and assess what the result is. This will keep on overloading you. I am looking for quick ways to make us less dependent on you. Integration into Jupyter seems to me to be a way to achieve that, with less difficulty than what you anticipate to integrate vue.js into Jupyter. Provided we have figured out how to "make a webapp out of a Jupyter notebook" (which is actually easy through binder and required in your plan as well), we can quickly achieve a better version of the workflow you suggest. To be discussed and clarified, we lack common background on how hard some of this stuff is.

alexbfree commented 2 years ago

This is old - pass back to Bizdev to review and decide if there is a current Ask for dev.