trustmaster / goflow

Flow-based and dataflow programming library for Go (golang)
MIT License
1.6k stars 125 forks source link

Feedback needed #48

Closed trustmaster closed 3 months ago

trustmaster commented 6 years ago

Hello fellow Gophers!

I apologise as this project slipped out of my scope for several years.

I still have some ideas and plans of maintaining it, but I need some feedback on how it is going to be used by people who actually tried it. So, I would really appreciate your answers to the following questions in this thread or in any other form (e.g. via email, please find the address in my profile).

Questions:

  1. What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)
  2. Did you use it for personal, business, or academic purposes?
  3. Do you prefer working with graphs in visual or text form?
  4. Which visual tools have you used and which ones do you prefer?
  5. Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?
  6. Do you prefer processes to stay resident in memory or started and stopped on demand?
  7. Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?
  8. Please tell me what you liked about GoFlow and what you would like to be added or changed.

Why this is important

As you might have noticed, this codebase is a bit dated. In fact, it was written in 2011 and didn't change much ever since. My own views on how an FBP library should work have changed over time. So, I think this library deserves a rewrite.

My views can be similar or different from yours, while I'm not building this library only for myself. That's why feedback is appreciated so much.

Thank you for participating!

trustmaster commented 6 years ago

@abferm @sascha-andres @lrgar @manadart @mtojek @btittelbach @davidkbainbridge @kortschak @samuell @josi-asae @seanward @phiros @lovromazgon Your feedback as a contributor is especially appreciated!

manadart commented 6 years ago

First of all, a big thank-you for contributing this to the community.

  1. We used this in a data-processing pipeline, receiving data parsed by a Python application. It did (does) validation, cache maintenance, matching against multiple other databases and resulting updates.

  2. Business.

  3. We did not use visual aids to graphing.

  4. (See above)

  5. We almost always used OnPortName()(see 7. below for more info).

  6. For our specific case, we wanted 24/7 uptime - no specific case for stopping/starting on demand. For maintenance, we closed the input channel and waited for the last data to be processed. Our cache maintained enough state to spin back up in good order.

  7. Our use case was very large volumes of data, so we sometimes used synchronous mode or a worker pool type component just to limit the data coming through, for observation.

  8. GoFlow made it easy for us to write logic steps and control flow decisions as simple components and then have them wired up in a single place, so it is nice in the modular sense, and also in the system overview sense too. It is very nice also, to delegate the concurrency/async mechanics to the library and just think in terms of steps, decisions and an input channel. Lastly, performance. We had powerful servers to run this on, so it just scales out (per machine) based on hardware limitations. One might choose something different (like https://github.com/AsynkronIT/protoactor-go) for a distributed pipeline, but for single-machine, this library is very convenient.

I have actually left the company where I worked on this, but it was with @tonygallagher and @lrgar. They might have more to add. I seem to recall an issue connecting multiple components to an exit port. We got around this by adding an aggregation component before the graph exit.

samuell commented 6 years ago

Hi @trustmaster, I want to also thank you so much for contributing this!

I've had a lot of fun by playing with GoFlow for bioinformatics use cases, and also learned a lot both about Go and some FBP, by studying it.

To answer the questions one at a time:

  1. What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)

I've did experimentation with bioinformatics components using it (like this). I eventually got worried about performance of reflection though, and has since explored a way to build FBP-like programs using only plain channels (this is ongoing work in my scipipe and flowbase libraries).

  1. Did you use it for personal, business, or academic purposes?

Academic.

  1. Do you prefer working with graphs in visual or text form?

Prefer to work mainly in text form. I feel graphs can be a very useful addition for sketching at design time, and for presentation though.

  1. Which visual tools have you used and which ones do you prefer?

I've used JPM's fbpdraw a bit for presentation purposes. I like that it is simple and just works. Haven't had time / patience for setting up any more complicated tools.

  1. Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?

I prefer a central Loop(), as bioinformatics tools often need to gather data from multiple inputs for each operation.

  1. Do you prefer processes to stay resident in memory or started and stopped on demand?

In our work, pipelines runs have had a clear start and finish time, so on-demand stop and start has not been something we saw a need for.

  1. Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?

No.

  1. Please tell me what you liked about GoFlow and what you would like to be added or changed.

I like that it is Go code, as that enables re-use of Go-tooling. I also found the API natural and easy to to understand and work with.

I'm worried about the performance hit from reflection though for data intensive pipelines. If there is any way to avoid reflection on each data read, e.g. by just using reflection to do set up channels, which are then used for the data communication, that would be great.

Keep up the great work!

trustmaster commented 6 years ago

@manadart @samuell thank you for your feedback! After collecting some more responses, I'm going to sum them up and make a proposal for a new version of GoFlow.

@samuell regarding your concern about the use of reflection, in the latest version it's mostly used to wire up the channels. The only thing that is used on each data read is passing the arbitrary data as reflect.Value, which is done to allow handler functions to have precise data types in their signature. The alternative is using interface{} as input in all handler functions, which is underneath very similar to how reflection works. On the other hand, I'm currently favouring more bare-bones components which decide how to read from their channels themselves (e.g. this).

samuell commented 6 years ago

@samuell regarding your concern about the use of reflection, in the latest version it's mostly used to wire up the channels. The only thing that is used on each data read is passing the arbitrary data as reflect.Value, which is done to allow handler functions to have precise data types in their signature. The alternative is using interface{} as input in all handler functions, which is underneath very similar to how reflection works. On the other hand, I'm currently favouring more bare-bones components which decide how to read from their channels themselves (e.g. this).

Ah, interesting, I should have another deep look at the code!

erdelmaero commented 6 years ago

Fist of all: Great work! It's much fun to use the package!

What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.) IoT

Did you use it for personal, business, or academic purposes? I want to use it for business.

Do you prefer working with graphs in visual or text form? I prefer working with visual graphs.

Which visual tools have you used and which ones do you prefer? Till now, I've only used text form.

Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())? I prefer handler functions.

Do you prefer processes to stay resident in memory or started and stopped on demand? Resident in memory.

Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow? I only testet the library so far. But soon I will try this modes.

Please tell me what you liked about GoFlow and what you would like to be added or changed. I really like the way how flow based programs are structured, and I would really like to see this package maintained in the future, so we can start using it in production!

roscopecoltran commented 5 years ago

Hi guys,

For my part, it would be much more for an advanced ETL/node-based pipeline. It would get lots of traction wiht DevOps as it could made easier and better to aggregate data (apis, logs) or to process data to import with dynamic/composable pipelines, chained by event triggers or conditions. It would also awesome for building smart/refined datasets for AI and Deep Learning.

Features

Example by video

Pipeline Video ScreenShot Gateway

Refs:

Cheers,

dahvid commented 3 years ago

Hi Trustmaster, I just started a project in Go, so I was looking for what had been done already in terms of DataFlow. Here are my answers to your questions, clearly two years later than the others!

What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)

At this point it would be for running a reactive systemd daemon that schedules and alters resource allocations for health-care applications sharing resources on a single host. But I have a similar product, not yet opened sourced called "coolflow" which is being used in-house for AI applications, it is in python, so not suitable for real-time applications Did you use it for personal, business, or academic purposes? This would be for business Do you prefer working with graphs in visual or text form? Text form, in previous projects I've used graph manipulation and generation to prepare complicated graphs for final execution. As such having graphs in python format was a big win, even though the execution was in C++/MPI or CUDA.

Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?

I prefer a main loop with AND semantics, that is component does not Process() until data exists on all input channels. I realize that OR semantics for components is more general, so I think creating AND components out of OR components should be easy Do you prefer processes to stay resident in memory or started and stopped on demand? Resident in memory, this is because the processes I use often call into GPU's, so they need to store state between executions Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow? I'm a newbe, but I'm sure I would end up using all of these Please tell me what you liked about GoFlow and what you would like to be added or changed. One thing that breaks the graph paradigm is error handling, If a component enters an error state it should be able to send a message and the graph should stop processing in a known state. Unfortunately in a pure graph semantics this means every component has a connection to an "error" component.

dahvid commented 3 years ago

One other thing I forgot. That is having a plug-in architecture where pre-compiled components can be read in along with graph definitions.