src-d / blog

source{d} blog
https://blog.sourced.tech/
GNU General Public License v3.0
27 stars 41 forks source link

[PROPOSAL] Data Retrieval pipeline at source{d} #209

Closed bzz closed 5 years ago

bzz commented 6 years ago

Table of contents

Data collection is less sexy than Machine Learning, but that is something that source{d} invested a lot into and so has been highlighted only in some talks, so it's time for a blog post! A story of Write part of source{d} DR pipeline.

ToC

This can also be a good segway to the blog post about PGA, as something that relies on this work.

It will intentionally not include much on Write part of the pipeline, leaving Engine/Gitbase story for a future posts/

Management

Social Media

eiso commented 6 years ago

I like the idea a lot

vmarkovtsev commented 6 years ago

I am +100500 this. We discussed it with Egor multiple times, and it is our pink dream.

ajnavarro commented 6 years ago

@bzz How can I help with that? maybe we can do a draft of what we want to tell on the blog post and develop from there.

bzz commented 6 years ago

Thank you āœØ team āœØ for support and enthusiasms!

Friendly ping @campoy, are there any feedback or suggestions on ToC / story structure from your side?

Otherwise, if no objections, going move forward and start preparing the draft so @ajnavarro and me could work on it together.

campoy commented 6 years ago

very much accepted!

vcoisne commented 5 years ago

@bzz Did you get a chance to start a draft ?

vcoisne commented 5 years ago

@bzz this would be a great topic for an online meetup. Would you be willing to cover that topic together with @ajnavarro? Once we have the video & slides it will be easier to write a blog post on the topic. Thoughts?

ajnavarro commented 5 years ago

Right now the Data retrieval team is managed by @jfontan . I don't feel comfortable giving a talk, but I can help with any slides or videos that can be required.

bzz commented 5 years ago

Right now the Data retrieval team is managed by @jfontan . I don't feel comfortable giving a talk, but I can help with any slides or videos that can be required.

Do not feel comfortable giving a talk since it's been a while since I worked on that part of the stack, but if @creachadair does not object I could actually try to kick preliminary blog post draft off next week if @jfontan would be interested in such help.

vcoisne commented 5 years ago

@bzz sounds good - @jfontan want to lead an online meetup on this topic ?

jfontan commented 5 years ago

@bzz, @ajnavarro thanks for your help offers. @vcoisne What does it mean to lead an online meetup? Preparing a presentation with the contents proposed by @bzz?

vcoisne commented 5 years ago

@jfontan yes preparing a presentation with the content from the blog post in collaboration with @bzz and @ajnavarro

bzz commented 5 years ago

Only while preparing the initial proposal draft I realized that none of this is a part of the Engine yet - https://github.com/src-d/engine/issues/52

It would be so cool though if in the call for action part at the end of the blog we could just point to Engine as an easy solution to try all this goodness that Antonio, Javi and the rest of the team has built!

jfontan commented 5 years ago

First a public thanks to @bzz for the effort of kicking off with the draft. I'll take a look and aim to push forward on Monday during the OSD.

vcoisne commented 5 years ago

Thanks @jfontan

@bzz Rover and Borges are not part of the Engine yet?

jfontan commented 5 years ago

@vcoisne, nope, Rovers and Borges are not part of the Engine. Moreover we are going to do a rewrite of Borges so I don't thing it's going to be part of Engine soon.

jfontan commented 5 years ago

I've added technical details and a couple of diagrams. Can you take a look and tell me where more information is needed? I believe that the premises and requirements changed quite a lot since the project was started. What I've written describes what's the current status.

vcoisne commented 5 years ago

@jfontan @bzz great to see the draft.

FYI we have an updated blog process

I have added the header to the gdoc. @bzz can you move it to the blog post folder under DevRel in the team drive ?

@jfontan did you get a chance to think about wether or not you'd like to give a talk about this topic as an online meetup ?

jfontan commented 5 years ago

I would make a meetup but not in the immediate future. Let's create the blog post and the think about the meetup afterwards. Preparing / giving presentations is quite draining for me and I still remember GitMerge.

vcoisne commented 5 years ago

understood. Thanks @jfontan

bzz commented 5 years ago

I have added the header to the gdoc. @bzz can you move it to the blog post folder under DevRel in the team drive ?

Done, the draft is now is in DevRel/Blog posts

bzz commented 5 years ago

@vcoisne on the draft comment

Draft looks good, it's very long tough. Could we make this a blog series ?

Although I understand the temptation to either make it shorter or break down to multiple pieces for the purpose of being easier to consume for the reader, I would advice agains that.

My reasons: this is an architectural blog post for the tech-savy engineering audience and the big picture + all the necessary details in single place are exactly the "bread and butter" of it and is the main appeal for people like "head of architecture"/CTO level (or those who want to be).

Breaking it down would make each individual piece less valuable as it would be harder to do both: get a big picture AND learn from the architectural decisions that we made.

Right now it's about 2k words and we have most of the content in place which also sounds like around average blog post size we had before.

But please, let me know if you feel strongly about that are we could keep an eye/discuss this further.

vcoisne commented 5 years ago

@bzz I'm ok with that. Another option would be to turn it into a white paper. Just sent you some DM on slack.

vcoisne commented 5 years ago

@bzz @jfontan @ajnavarro is this ready for publication? Looks like there are a few open comments on the gdoc?

As discussed previously I would love to schedule a webinar on this topic. Please let me know you're up for it ?

bzz commented 5 years ago

I did a pass over the draft and think it's almost ready šŸŽ‰ Will make a final push first thing on Monday.

@jfontan there are few un-answered comments in the text - could you also please take a look after vacations?

@ricardobaeta I have tagged you in a comment in there, would you be so kind to take and see if that is of any interest for you? Thanks in advance!

bzz commented 5 years ago

Did another pass of šŸ¤– spell/punctuation checks over the draft.

@ajnavarro @vmarkovtsev would you be so kind to proof-read it, if you have some spare cycles? And please, feel free to suggest any changes using "Suggest" tool in gdoc!

bzz commented 5 years ago

The blog post is out https://blog.sourced.tech/post/data-retrieval-pipeline-at-source-d/ - šŸ™‡ @jfontan @vcoisne and everyone who helped to make this happen!