Closed bzz closed 5 years ago
I like the idea a lot
I am +100500 this. We discussed it with Egor multiple times, and it is our pink dream.
@bzz How can I help with that? maybe we can do a draft of what we want to tell on the blog post and develop from there.
Thank you āØ team āØ for support and enthusiasms!
Friendly ping @campoy, are there any feedback or suggestions on ToC / story structure from your side?
Otherwise, if no objections, going move forward and start preparing the draft so @ajnavarro and me could work on it together.
very much accepted!
@bzz Did you get a chance to start a draft ?
@bzz this would be a great topic for an online meetup. Would you be willing to cover that topic together with @ajnavarro? Once we have the video & slides it will be easier to write a blog post on the topic. Thoughts?
Right now the Data retrieval team is managed by @jfontan . I don't feel comfortable giving a talk, but I can help with any slides or videos that can be required.
Right now the Data retrieval team is managed by @jfontan . I don't feel comfortable giving a talk, but I can help with any slides or videos that can be required.
Do not feel comfortable giving a talk since it's been a while since I worked on that part of the stack, but if @creachadair does not object I could actually try to kick preliminary blog post draft off next week if @jfontan would be interested in such help.
@bzz sounds good - @jfontan want to lead an online meetup on this topic ?
@bzz, @ajnavarro thanks for your help offers. @vcoisne What does it mean to lead an online meetup? Preparing a presentation with the contents proposed by @bzz?
@jfontan yes preparing a presentation with the content from the blog post in collaboration with @bzz and @ajnavarro
Only while preparing the initial proposal draft I realized that none of this is a part of the Engine yet - https://github.com/src-d/engine/issues/52
It would be so cool though if in the call for action part at the end of the blog we could just point to Engine as an easy solution to try all this goodness that Antonio, Javi and the rest of the team has built!
First a public thanks to @bzz for the effort of kicking off with the draft. I'll take a look and aim to push forward on Monday during the OSD.
Thanks @jfontan
@bzz Rover and Borges are not part of the Engine yet?
@vcoisne, nope, Rovers and Borges are not part of the Engine. Moreover we are going to do a rewrite of Borges so I don't thing it's going to be part of Engine soon.
I've added technical details and a couple of diagrams. Can you take a look and tell me where more information is needed? I believe that the premises and requirements changed quite a lot since the project was started. What I've written describes what's the current status.
@jfontan @bzz great to see the draft.
FYI we have an updated blog process
I have added the header to the gdoc. @bzz can you move it to the blog post folder under DevRel in the team drive ?
@jfontan did you get a chance to think about wether or not you'd like to give a talk about this topic as an online meetup ?
I would make a meetup but not in the immediate future. Let's create the blog post and the think about the meetup afterwards. Preparing / giving presentations is quite draining for me and I still remember GitMerge.
understood. Thanks @jfontan
I have added the header to the gdoc. @bzz can you move it to the blog post folder under DevRel in the team drive ?
Done, the draft is now is in DevRel/Blog posts
@vcoisne on the draft comment
Draft looks good, it's very long tough. Could we make this a blog series ?
Although I understand the temptation to either make it shorter or break down to multiple pieces for the purpose of being easier to consume for the reader, I would advice agains that.
My reasons: this is an architectural blog post for the tech-savy engineering audience and the big picture + all the necessary details in single place are exactly the "bread and butter" of it and is the main appeal for people like "head of architecture"/CTO level (or those who want to be).
Breaking it down would make each individual piece less valuable as it would be harder to do both: get a big picture AND learn from the architectural decisions that we made.
Right now it's about 2k words and we have most of the content in place which also sounds like around average blog post size we had before.
But please, let me know if you feel strongly about that are we could keep an eye/discuss this further.
@bzz I'm ok with that. Another option would be to turn it into a white paper. Just sent you some DM on slack.
@bzz @jfontan @ajnavarro is this ready for publication? Looks like there are a few open comments on the gdoc?
As discussed previously I would love to schedule a webinar on this topic. Please let me know you're up for it ?
I did a pass over the draft and think it's almost ready š Will make a final push first thing on Monday.
@jfontan there are few un-answered comments in the text - could you also please take a look after vacations?
@ricardobaeta I have tagged you in a comment in there, would you be so kind to take and see if that is of any interest for you? Thanks in advance!
Did another pass of š¤ spell/punctuation checks over the draft.
@ajnavarro @vmarkovtsev would you be so kind to proof-read it, if you have some spare cycles? And please, feel free to suggest any changes using "Suggest" tool in gdoc!
The blog post is out https://blog.sourced.tech/post/data-retrieval-pipeline-at-source-d/ - š @jfontan @vcoisne and everyone who helped to make this happen!
Table of contents
Data collection is less sexy than Machine Learning, but that is something that source{d} invested a lot into and so has been highlighted only in some talks, so it's time for a blog post! A story of Write part of source{d} DR pipeline.
ToC
Motivation (why) Started as a company with a single data-driven product and ad-hoc data collection As pivoted to ML on Code, require more generic tools for ML experiments and applications
Architecture (what) RDBMS for URL storage, distributed crawlers \w RabbitMQ, git native protocol and format, storing forks together Go language: go-git, ORM
Implementation (how) Rovers, Borges, Siva
This can also be a good segway to the blog post about PGA, as something that relies on this work.
It will intentionally not include much on Write part of the pipeline, leaving Engine/Gitbase story for a future posts/
Management
Social Media