mediative / eigenflow

ETL orchestration platform with recoverability and process monitoring features
https://mediative.github.io/eigenflow/
Apache License 2.0
9 stars 4 forks source link

Flesh out README #2

Closed suhailshergill closed 8 years ago

suhailshergill commented 8 years ago

Motivation

To better communicate and establish what eigenflow is intended and good for we need to add more documentation which provides some example usecases and highlights some comparables.

Input

Current README.md

Output

Updated README.md

Test

jonas commented 8 years ago

BTW, the release process is documented here in case we want to include it somehow: https://github.com/ypg-data/sbt-mediative#releasing

suhailshergill commented 8 years ago

that's a good idea @jonas @dmitri-carpov are you intending to assign this issue to yourself. let us know if you'd like us to assist in any manner

yawaramin commented 8 years ago

@suhailshergill I vote to close this as the readme now seems pretty fleshed-out.

suhailshergill commented 8 years ago

@yawaramin please check mark all the things in the test section which are covered. if all of them are (i didn't think they all were), then you are welcome to close this issue

dmitri-carpov commented 8 years ago

@suhailshergill @yawaramin @jonas could you please qa this? The second point was discussed and decided to put it out of the official documentation.

jonas commented 8 years ago

@dmitri-carpov If you want to have stuff reviewed or QAed I suggest you create PRs. By adding the text Fixes #2 the ticket would then automatically be closed when the PR is merged.

dmitri-carpov commented 8 years ago

@jonas agree, my bad, I should have done it from the very beginning. This issue affects multiple commits now, all related to the README file. If I do a PR now it is going to cover just a part of it. What I'd like is a review for the whole documentation. Will do PRs in the future.

jonas commented 8 years ago

@dmitri-carpov Some notes:

Eigenflow is an orchestration platform which allows to build resilient and scalable data pipelines.

Eigenflow is an orchestration platform for building resilient and scalable data pipelines.

It is created for periodic long running ETL processes where restarting from the beginning in case of failures is critical.

Restarting is an optional and not critical functionality?

Eigenflow encourages process developers to split processes in stages which can be persisted and monitored automatically.

Pipelines can be split into multiple process stages which are persisted, resumed and monitored automatically.

Platform limitations:

Should be moved under Main Features

resolvers += Resolver.url("YPG-Data SBT Plugins", url("https://dl.bintray.com/ypg-data/sbt-plugins"))(Resolver.ivyStylePatterns) addSbtPlugin("com.mediative.sbt" % "sbt-mediative-core" % "0.1.1") addSbtPlugin("com.mediative.sbt" % "sbt-mediative-oss" % "0.1.1")

This is not needed

akka

Should be spelled Akka

Eigenflow, eigenflow

Pick one

jonas commented 8 years ago

I think it could also use a good read though to fix some grammatical errors

jonas commented 8 years ago

Other than that :+1:

suhailshergill commented 8 years ago

(ETL) processes, (ETL) jobs ... Eigenflow encourages process developers to split processes in stages which can be persisted and monitored automatically. ... Hardly pays off for simple atomic jobs (one stage process).

let's limit confusion and refer to these in a standard way. this is a minor nitpick.

Does not provide connectors to 3rd party systems.

i would not call this a "platform limitation". it's out of scope. we may release helper libraries for various backends/connections

It is not a replacement for ESB or BPM systems, in cases when a very complex workflow involved and there is a need for UI to draw the processes it's better to consider another products.

i don't understand this, please elaborate and explain why.

Supports scala language only.

not that i particularly care, but couldn't other jvm languages wrap around our libraries? if so, should we clarify that

String => A

shouldn't this be String => Option[A]? it may not yet be implemented as such today, but in that case please open up an issue. let me know if the motivation etc for the issue aren't clear and i can add details. once created, please link the issue from the README

you currently have code examples in the README. how are you ensuring that they compile? should we have an examples module with code which compiles and link to it? should we use tut? does this go with

TODO: examples

we may want to be more explicit of the OSes we currently support and what the distinction is. specifically a portion would work wherever you can get scala to work, but then we also have some devops scripts specific to Mac OS (btw which Mac OS version do they support?). what's the distinction?

do we have travis builds? if so, could you add a badge for it to the README?

dmitri-carpov commented 8 years ago

@suhailshergill

It is not a replacement for ESB or BPM systems, in cases when a very complex workflow involved and there is a need for UI to draw the processes it's better to consider another products. i don't understand this, please elaborate and explain why.

i don't understand this, please elaborate and explain why.

BPM assumes human tasks during the process (usually interactive forms), we obviously do not provide anything like that. All our steps (stages) are "system tasks" what make it closer to ESB systems. In case of ESB, if most of the components are written using SOA then ESB software usually provides high level tools for integration and orchestration where in our cases all services would have to be programmatically called and integrated. I see Eigenflow somewhere between simple cron jobs and complex enterprise processes (where ESB is usually used).

Supports scala language only.

not that i particularly care, but couldn't other jvm languages wrap around our libraries? if so, should we clarify that

In theory yes but our DSL is written in scala, I think in java won't be that elegant at all. By support I mean API and clear documentation.

suhailshergill commented 8 years ago

@dmitri-carpov i realize i was parsing the sentence incorrectly which led to some of the confusion. makes more sense now. personally i see that everything is programmatically called and integrated a strength, but i'm biased (coz i can code).

i do wonder how complex a workflow we would be able to handle and to what extent being able to visualize workflows would suffice.

dmitri-carpov commented 8 years ago

@suhailshergill I'm not an ESB or BPM expert but from what I saw very little of them support extremely complex workflows, for example, conditional branches with "AND joins": when some steps can be executed in parallel but at some point the process waits for those to be done together (and what to do if one never ends).

In our current plan is to handle conditional branching: when workflow may have different paths and the path can be defined by some logic, also the skipRun functionality should appear soon.

Regarding visualization, I think it makes sense when you have an infrastructure full of different web services and you can re-use them and build different processes integrating them, we still target a lower level integration, therefore I think efforts hardly outweigh benefits.

dmitri-carpov commented 8 years ago

Does not provide connectors to 3rd party systems.

i would not call this a "platform limitation". it's out of scope. we may release helper libraries for various backends/connections

Yes, "platform limitation" is probably not the best place for it, just wanted to make it clear because most of other ETL platforms provide built-in connectors, but I agree connectors should probably better be included as another dependency.