Q3 Goals discussion - Githubissues

mattkime commented 7 years ago

Proposed Q3 goals

Improved production reliability
- Blue / green deployment
- Improved alerting / monitoring
- Needs more detail
Modularization of platform
- Need to specify which modules will be broken out
API Caching
- Will require work with...
Product team support
- To be defined....

Objective

Deploy faster with higher quality

Key Results

Maybe

Performance improvements
- browser preload
Better modularization of platform and tools
Live UI updating
Offline first

Failed to make the cut

HTTP/2 push (browsers not ready)

mmcgahan commented 7 years ago

Thanks for putting this up. I'd like to know more about some of the goals to get a better sense of how they relate to what we're currently doing and what you're thinking we would have to do to achieve them.

Offline-first

this seems like a good goal. I'd be cautious about setting it as a Q3 vs an H2 goal given that there's a fair amount of experimental work involved

API Caching:

On the Hapi server or in the browser?
Caching the REST API calls on the server should probably wait until we have upstream support for time-to-live headers (Cache-Control/Expires), which aren't currently implemented. I think there is going to be a caching effort in H2, but I think we should avoid setting up a dependency that we don't have a lot of control over.
There are many different ways of approaching this problem, and I would hope that we could do a full survey of options in the course of a few months, but implementation would likely take longer

Live UI updating

I think this refers to part of the discussion in the PWA doc, but as I mentioned there, we'd probably want to do this with Server Side Events, which is HTTP/2 dependent and not fully supported across browsers. We could potentially implement it with websockets, but my experience from Coco suggests that a websocket implementation would be months of work that would then most likely be scrapped for SSE eventually anyway.

The larger immediate obstacle which is REST API support - there would need to be a very large BE project dedicated to streaming data for each endpoint that we wanted to update live, and that's probably not going to happen this year. Also, my experience from Coco using streaming servers makes me very cautious about putting a timeframe around this support - I suspect it will need to be product-team driven through a coordinated BE + Web eng effort.

We could do another pass at alerting / monitoring

I'd like to know more specifics about this before putting a timeframe on it. We can and should put together a document describing our ideal alerting/monitoring behavior, but it's one of those things where it could take anywhere from 1 to 5 sprints to assemble different tools

Blue / green deployment

My impression at the moment is that this will be partially dependent on other teams, which makes it hard to timebox. Definitely something we could work on, but I'm cautious about marking it as a goal, particularly if it ended up requiring significant changes to our deployment infrastructure.

Given that we've got the core features running reasonably well in production at the moment, the thing that I am most interested in doing now is consolidating the platform into a more coherent and well-defined library of modules - mainly described in #281. Essentially, I'd like to take the time to address the tech debt accumulated over the past year+ of experimentation and MVP-oriented releases. Now that we're in production, our standards for releases need to be even higher and we really should be polishing the system to be robust enough to adapt to the needs of the next 10 years of development. It's not currently in a state where I would be comfortable passing it on to a new developer and expecting them to use it effectively without a lot of guidance from the web platform team, but I think we need to get it to that state in order for it to be a long-term success.

This means slowing down the focus on 'features' like PWA, which will add complexity.

Implementing the platform as a well-defined set of modules will make all future development simpler, as well as give us the opportunity to genuinely open-source some of the more polished modules that we write - we can take the time to properly document them and set them up as packages in npm, and make a genuine effort to share what we've learned with the wider community.

If we ended up at the end of the year with a handful of OSS packages that were well documented, tested, and maintained, I think we could consider it a successful H2 - caching and offline-first support don't seem as high-priority to me given the current prod metrics.

mattkime commented 7 years ago

this seems like a good goal. I'd be cautious about setting it as a Q3 vs an H2 goal given that there's a fair amount of experimental work involved

Can you define a Q3 goal or goals?

Caching - On the Hapi server or in the browser?

Seems like there's overlap here with offline first.

Caching the REST API calls on the server should probably wait until we have upstream support for time-to-live headers

Since we're creating the UI I think we have a chance to drive the conversation here. How fresh does info need to be for a given view?

I agree that server level caching is likely out for Q3 - at least the sort thats our responsibility. Other API consumers will want caching server side caching. If we imagine having a highly cached API server, how much time would we save by having a mup-web local cache?

I think this refers to part of the discussion in the PWA doc, but as I mentioned there, we'd probably want to do this with Server Side Events

Then we're back to doing it with API polling. I'm not sure I understand your reasons against this. I think my worry at this point would be that we're trying to reduce server load and this wouldn't do that.

mmcgahan commented 7 years ago

Can you define a Q3 goal or goals?

Sure - I did a writeup earlier this week that I'll share as a separate comment

Caching - On the Hapi server or in the browser?

Seems like there's overlap here with offline first.

Yes there is - we should be more specific about what is part of the offline-first caching strategy and what other strategies we would like to implement. Offline-first always uses a service worker, whereas the API requests can be cached in lots of different places. In the browser, we're currently caching API request/responses in IndexedDB using Redux middleware, but that doesn't simulate network responses the way a service worker would.

Caching the REST API calls on the server should probably wait until we have upstream support for time-to-live headers

Since we're creating the UI I think we have a chance to drive the conversation here. How fresh does info need to be for a given view?

We should definitely be part of the conversation - I expect that conversation will be ongoing until late in the year, and although we could do some preliminary development to support the expected BE implementation, I don't think we can count on having anything complete in Q3 and maybe not even H2.

I agree that server level caching is likely out for Q3 - at least the sort thats our responsibility. Other API consumers will want caching server side caching. If we imagine having a highly cached API server, how much time would we serve by having a mup-web local cache?

So, there are 3 broad caching layers we're discussing:

the REST API
the Node API proxy endpoint
the browser

We don't have to worry about (1), and we would need to evaluate different options for (2). I think we will eventually end up with some API caching in the Node layer because it would eliminate a ton of GCP-AWS http requests, which has both performance and cost benefits, but it's complex and dependent on a lot of upstream work so probably not going to be viable in the near term.

I think this refers to part of the discussion in the PWA doc, but as I mentioned there, we'd probably want to do this with Server Side Events

Then we're back to doing it with API polling. I'm not sure I understand your reasons against this. I think my worry at this point would be that we're trying to reduce server load and this wouldn't do that.

I don't think there is any live-data implementation that reduces server load. That's probably the simplest blocker to implementing it in the near term.

To expand on the discussion about what tech should be used to get the data: there are 4 main approaches to getting live data, all of which would require REST API work to support:

Regular polling - this strategy requires creating a lot of network traffic for every active user. Generally not considered a good approach for performance reasons.
Long polling - has a number of weaknesses, some of which are summarized well here: https://stackoverflow.com/a/23108363. Essentially, we would have to tell the Node server to keep multiple open requests to the API server for every member using the site.
websockets - when we were working on member to member messages a few years ago, websockets were the best option, and if a product team wanted live data in a new feature, I would recommend they look into setting it up that way.
HTTP/2 Server Side Events - doesn't have great support, would require major server work on the REST API and Node server.

However, I don't get the sense that live UI updating is a big product requirement for any current work. If and when it is, I think it will need to involve a full stack effort beyond the scope of the web platform, requiring significant REST API server work. Live data support is very complex and we're not going to be able to do it this year in a general way.

h-will-h commented 7 years ago

I agree that server level caching is likely out for Q3 - at least the sort thats our responsibility. Other API consumers will want caching server side caching. If we imagine having a highly cached API server, how much time would we serve by having a mup-web local cache?

@mattkime @mmcgahan I think API-caching will be a thing that we want to figure out in Q3, but maybe not implement. I think the web platform team definitely needs to be involved in figuring that out, but we will need to coordinate with the rest of the platform teams. There's also a tracking component here. If we want to track all the things, but we're not talking to the server... how do we want to do it?

Implementing the platform as a well-defined set of modules will make all future development simpler, as well as give us the opportunity to genuinely open-source some of the more polished modules that we write - we can take the time to properly document them and set them up as packages in npm, and make a genuine effort to share what we've learned with the wider community.

Yeah, I really like this and think it's a good goal. What we do need is a metric to measure success against, and a really clear story to tell about why we think this is the way to get to that.

We could do another pass at alerting / monitoring

I'd like to know more specifics about this before putting a timeframe on it. We can and should put together a document describing our ideal alerting/monitoring behavior, but it's one of those things where it could take anywhere from 1 to 5 sprints to assemble different tools

Agree that we need to define this better and come up with what done looks like (or, what we think it might look like)

Blue / green deployment

My impression at the moment is that this will be partially dependent on other teams, which makes it hard to timebox. Definitely something we could work on, but I'm cautious about marking it as a goal, particularly if it ended up requiring significant changes to our deployment infrastructure.

I think we need to do some investigation here, but I don't think that coordination with other teams disqualifies something from being a goal. We do need to talk to teams and plan, but that needs to happen anyways.

mmcgahan commented 7 years ago

Here's the H2 plan that I would propose.

Edit: in response to @willh-meetup -

What we do need is a metric to measure success against, and a really clear story to tell about why we think this is the way to get to that

Let me know if you think this provides clear enough metrics - I think the target module definitions are relatively well-defined in #281, and I've tried to be clear about test coverage, although the 'document and release as OSS packages' is a little more ambiguous. Relatively easy to flesh out, though.

H2 2017 dev plan

3-4 sprints: Clearly define & implement the public interface of all meetup-web-platform modules (#281)
2-3 sprints: Retire the web-platform-starter repo and fully document shared application patterns
2-3 sprints: Complete unit testing coverage (90+%) and improve static typing coverage, including exportable type definitions
1-2 sprints: Document and publicly release OSS packages
remainder: Product support and planning for Phase Three

Expected benefit

These goals will support a clearer, shared mental model of the application and ensure that the right tool is applied to the right problem at the right time more predictably, and that mistakes are more easily identifiable earlier in the dev cycle. It will make us faster.

The engineering benefit naturally supports better product development, shorter launch cycles, and reduced product resources dedicated to maintenance.

The more intangible benefit of creating better open source software should be felt in recruitment, where higher-quality engineers will be more likely to see and be interested in the technical opportunities at Meetup, as well as providing a strong professional development opportunity for current engineers.

Future work (2018)

major caching update based on upstream API caching work done in H2 2017
Offline-first support and documentation
App server HTTP/2 implementation and feature planning
HTTP/2 Server Push implementation
HTTP/2 Server Side Events implementation
Build optimization
Documentation and public releases of OSS packages

mmcgahan commented 7 years ago

Blue / green deployment

I think we need to do some investigation here, but I don't think that coordination with other teams disqualifies something from being a goal

Definitely. I'm cautious about making the actual implementation of blue/green deployment a Q3 goal because of how slow it can be to coordinate infrastructure/deployment work with other teams, but we'll definitely continue working on it - making it an H2 goal seems more reasonable, or "coordinate with other teams to determine infrastructure requirements for blue/green deployments" as a more controllable Q3 goal that is less contingent on another team's availability

h-will-h commented 7 years ago

Definitely. I'm cautious about making the actual implementation of blue/green deployment a Q3 goal because of how slow it can be to coordinate infrastructure/deployment work with other teams, but we'll definitely continue working on it - making it an H2 goal seems more reasonable, or "coordinate with other teams to determine infrastructure requirements for blue/green deployments" as a more controllable Q3 goal that is less contingent on another team's availability

Yep, understood. I'm talking about coordinating with teams in the next few weeks so we can try and get a size and scope before we promise anything :).

mattkime commented 7 years ago

@mmcgahan I like the direction and plan but would like to see it broken down into smaller pieces to fit within Q3 and along side other work.

Is there a priority to which modules get broken out first?

mmcgahan commented 7 years ago

The details of the module breakdown is in #281. If we need a Q3-specific plan as opposed to H2, we can aim to complete the module refactoring and eliminating the starter repo - that seems like a good, ambitious goal for 3 months of work alongside product support.

There are a few module dependencies that would determine some dev priorities, but most of those are related to the Hapi plugin work, and are detailed in the Google Doc linked from #281

mmcgahan commented 6 years ago

'Future work' for 2018 will remain in the mix for 2018 planning - the remainder has mostly been sorted out

meetuparchive / meetup-web-platform

Q3 Goals discussion #302

H2 2017 dev plan

Expected benefit

Future work (2018)