ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
401 stars 30 forks source link

SpaceX's Starlink #432

Open RubenKelevra opened 4 years ago

RubenKelevra commented 4 years ago

I was wondering if we could work with SpaceX on the integration of IPFS into their Starlink constellation.

One one hand they need a way to distribute the IPv6 addresses geographically, to allow for easy routing without large routing tables. So if we could get the topology from them, we could use them for a cost and latency estimation to optimize fetching from nodes which has lower costs/latency.

On the other hand, SpaceX could integrate IPFS into their ground stations/satellites in some ways. For example, the satellites could offer a way to register nodes reachable through the satellite. So nodes can query this information to find near peers for bitswap requests.

Someone already contacted SpaceX before?

IPFS was used in space projects before by the Esa:

http://blog.klaehn.org/2018/06/10/efficient-telemetry-storage-on-ipfs/

RubenKelevra commented 4 years ago

SpaceX will do a AMA on Reddit next week dedicated to the Starlink project. Maybe worth throwing in some questions?

@momack2 @Stebalien @hsanjuan ?

bertrandfalguiere commented 4 years ago

http://blog.klaehn.org/2018/06/10/efficient-telemetry-storage-on-ipfs/

Unfortunately, I don't think the ESA used IPFS. The author of the blog worked on data compression at the ESA, then later worked with IPFS at Actyx.

The use-case for Starlink is not clear to me.

we could use them for a cost and latency estimation to optimize fetching from nodes which has lower costs/latency.

I think this would add a dependence on an external actor. Low cost and latency would be measured as the latency to the satellites, I'm not sure it's relevant for ground-to-ground latency between libp2p nodes. Additionally, it would mean propagating the insight from SpaceX in the network, if they are ok to share the topology with us (aka, the whole world).

On the other hand, SpaceX could integrate IPFS into their ground stations/satellites in some ways.

The use case I see is a swarm of satellites and stations for fault tolerance of a node and load balancing of traffic. Also, for updates of software in the constellation, like Netflix did. With the future Episub (specs ), they could minimize the redundancy of messages between satellites. I don't know which is cheaper for them though: sat-sat connections, or ground-sat connections? If it is ground-sat, they will push updates from the ground and there's no real benefits to use P2P for software update. I suspect sat-sat connections are cheaper and more reliable, though, as messages don't have to go through the atmosphere, clouds, or thunderstorms and the electromagnetic noise is probably low in space since other satellites talk to the ground with laser-focused beams.

For example, the satellites could offer a way to register nodes reachable through the satellite. So nodes can query this information to find near peers for bitswap requests.

This could be useful as a complementary peer discovery mechanism, but it also centralizes on SpaceX, giving them the opportunity to bias the network in favor of their nodes. It could also enable to "sky-bootstrap" from their nodes, but this brings the same caveats, and begs for eclipse (no pun intended) attacks.

I think Starlink could use IPFS, I don't think IPFS should use Starlink.

SpaceX could use IPFS for Mars colonization though (martian nodes fetching data from martian nodes first, and the rest only once from 0-RTT QUIC queries to Earth with delegated routing to some Earth nodes). IPFS needs to bake-in very high compression and code erasure for these ultra-high latency and very unreliable connections, though. (Also, set timeout to 2020 seconds...)

@RubenKelevra , do you mind sharing where you find info about Starlink Ask Me Anything session? :)

RubenKelevra commented 4 years ago

Thanks for the in-depth analysis. Quite interesting.

Starlink is using lasers for satellite to satellite communication, while communication to ground stations and users will use the Ku and Ka band radio.

On the pinning summit 3 weeks ago there was a lot of questions about accessing and caching of filecoin content with IPFS and enabling fast access with a specific class of filecoin nodes near the the end-users.

I feel like on the next ground station or within the satellite is a good place for a access node for filecoin or an ipfs node which can be used by the endusers which will just cache popular content to avoid some traffic.

It could also be interesting for content providers to rent storage for fast access on the satellites which offer internet access in a region, like Netflix and other content provider do with regular ISPs.

This would require to move the data from satellite to satellite, as they move around the earth, since the data which is interesting for Europe might not be interesting for the US and the other way around.

This way a lot of uplink bandwidth to the satellites could be avoided, as the data is just moved inside the satellite network via laser links.

IPFS corporation with Starlink would be great, since Starlink is about getting access to the internet for the whole world. It might be a wild wish, but I like to think about it also as a way to connect many parts of the world which had very low to no internet access before. Empowering those people not only with internet access but also with a software which can host and share their data without having to rent or buy expensive servers would be great!

The Starlink AMA will be announced on their social media accounts. They announced the AMA (with Starlink software developers) on their last webcast of a Starlink satellite launch :)

RubenKelevra commented 4 years ago

Here's the link to the AMA:

https://twitter.com/SpaceX/status/1268991039190130689?s=19

autonome commented 4 years ago

Maybe related: I've have been talking with the libre.space team for some time about a grant to prototype an embedded IPFS (subset of protocol) for use in open source sat comms, and embedded devices generally.

bertrandfalguiere commented 4 years ago

@RubenKelevra

I feel like on the next ground station or within the satellite is a good place for a access node for filecoin or an ipfs node which can be used by the endusers which will just cache popular content to avoid some traffic.

Satellites may be able to quickly fetch things, but they won't cache much. I doubt there is a lot of disk space embarked on satellites. They will be coordinating and relaying data flows, they won't be used to store anything relevant. I really like the idea of them being Filecoin Retrieval Market Nodes, though :).

It could also be interesting for content providers to rent storage for fast access on the satellites which offer internet access in a region, like Netflix and other content provider do with regular ISPs.

Again, I don't thing satellites will be able to store much. Maybe some ground stations will. Anyway, I'm personally very uncomfortable with Starlink being paid for faster access. I see the constellation as an infrastructure, much like cables, and I hope they will treat all contents as equal in the name of Net Neutrality. I hope they will charge for Internet access, and nothing more. Of course they will cache things to provide better access and optimize latency, but I see it as something different from being paid to cache a particular content, because the former doesn't biais the network.

@autonome

I've have been talking with the libre.space team for some time about a grant to prototype an embedded IPFS (subset of protocol) for use in open source sat comms, and embedded devices generally.

Wow, this is rad. O_o Let us know as soon as there is something new in that... space.

RubenKelevra commented 4 years ago

@RubenKelevra

I feel like on the next ground station or within the satellite is a good place for a access node for filecoin or an ipfs node which can be used by the endusers which will just cache popular content to avoid some traffic.

Satellites may be able to quickly fetch things, but they won't cache much. I doubt there is a lot of disk space embarked on satellites. They will be coordinating and relaying data flows, they won't be used to store anything relevant. I really like the idea of them being Filecoin Retrieval Market Nodes, though :).

Well, SSDs are cheap, small and light and compared to usual communication satellites which are in geosynchronous orbit you could use off the shelf parts, since the radiation is still pretty low.

If you would just add 10 SSDs with 1 TB to have some redundancy if one fails while the cost would still be below 1000 bucks.

With expected 10 GBit/s real total bandwidth the cache could hold more than 2 hours of full bandwidth use.

It could also be interesting for content providers to rent storage for fast access on the satellites which offer internet access in a region, like Netflix and other content provider do with regular ISPs.

Again, I don't thing satellites will be able to store much. Maybe some ground stations will. Anyway, I'm personally very uncomfortable with Starlink being paid for faster access. I see the constellation as an infrastructure, much like cables, and I hope they will treat all contents as equal in the name of Net Neutrality. I hope they will charge for Internet access, and nothing more. Of course they will cache things to provide better access and optimize latency, but I see it as something different from being paid to cache a particular content, because the former doesn't biais the network.

Well, compare it to DNS, you can use the DNS-Servers of your provider or not. They offer the service because they can store the data nearer to you.

The same is true for content provider like Netflix. They do rent rackspace in internet exchange points and in locations of internet service providers, to avoid having to send the traffic through the regular links up to the providers.

It's a win-win situation for both sides, since the traffic costs are decreasing.

The idea is to do a distributed approach - if you like you can ask the satellites for data you like to download, by trusting SpaceX with your IPFS node.

This way the content can be delivered faster, with lower latency.

And the satellites will just track the most requested blocks (misses) and receive them from the network, to increase the hitrates, while dropping the blocks with the lowest interest by the clients.

Storage providers could also rent storage if they expect for example a launch of a popular series or a new patch is provided for gamers, to allow fast response and distribution before release.

I don't see an issue with that, since it's optional.

@autonome

I've have been talking with the libre.space team for some time about a grant to prototype an embedded IPFS (subset of protocol) for use in open source sat comms, and embedded devices generally.

Wow, this is rad. O_o Let us know as soon as there is something new in that... space.

Sounds cool!

bertrandfalguiere commented 4 years ago

If you would just add 10 SSDs with 1 TB to have some redundancy if one fails while the cost would still be below 1000 bucks.

No, cost will be much higher, because drives and shielding cost weight and physical space, which are of the essence in space. Even if you pack 1PB per satellite, you won't cache Netflix anytime soon. If Starling sells (space) caching space, it will be crazy expensive for these reasons. And I don't think Netflix will be a client. They need channels with a low price per kB not crazy low latency. Some subscribers will access Netflix via Starlink as their ISP, but Netflix won't pay crazy prices for their users to have 0.2 seconds less delay when starting a 2-hour movie.

The same is true for content provider like Netflix. They do rent rackspace in internet exchange points and in locations of internet service providers, to avoid having to send the traffic through the regular links up to the providers.

I know they do, but it doesn't feel right to me. But this is a philosophical debate, we don't need to agree.

It's a win-win situation for both sides, since the traffic costs are decreasing.

Netflix wins, the ISP wins. Overall traffic costs rises because the caching is not solely based on popularity but is biased by the market. Netflix is paying for that, but the overall network is still under-optimized. But again, we don't need to agree on whether this is good or bad.

The idea is to do a distributed approach - if you like you can ask the satellites for data you like to download, by trusting SpaceX with your IPFS node.

If they use IPFS, luckily you don't have to trust them thanks to content addressing.

And the satellites will just track the most requested blocks (misses) and receive them from the network, to increase the hitrates, while dropping the blocks with the lowest interest by the clients.

I'm all in favor of that as it happens organically. No one paid to biais the network in their favor.

Storage providers could also rent storage if they expect for example a launch of a popular series or a new patch is provided for gamers, to allow fast response and distribution before release.

With vanilla IPFS, the new popular content will be replicated as soon as it is popular. With paid space, you over-optimize it for the paying providers taking more ressources than needed, and under-optimize for not-paying popular content.

To sum up, use-cases are:

Looks nice :)

MatthewSteeples commented 4 years ago

Just a quick point on Netflix, they cache things at the ISP level because of the amount of data they shift, and when they shift it. If a regional ISP has 10,000 customers that watch an episode of a particular show, then that's 20TB of bandwidth that the ISP is serving in it's internal network (rather than coming in from another network). Netflix then configure their caches to update overnight with new episodes (when the traffic links are quieter).

The same logic doesn't apply for Starlink because the same satellite is only in your field of view for a matter of minutes before you end up being served by another one

RubenKelevra commented 3 years ago

@MatthewSteeples that's true. The idea is, that the satellites use their interconnection links to fetch the content from other satellites instead of from the next ground station or from the internet.

This way the satellites can equally utilize their interconnections and reduce the amount of traffic necessary to be transmitted from the ground stations.

The satellites could also shift the cache data when they move to the next region, allowing the cache to stay in the same regional position.

Say videos in Chinese stay over China and videos in German stay over Germany.

mitra42 commented 3 years ago

Really @RubenKelevra ? Those small satellites have really tiny power availability (7Watts I believe), which limits the power of their transmitters which limits the bandwidth they have available.

I would have thought the challenges using IPFS would be 1: Increased bandwidth requirements 2: making it work in a partially connected world - for example finding something where you might not be able to reach the relevant DHT node 3: Error responsiveness - IPFS still AFAIK has the problem of not telling you when it fails, so its not possible to gracefully fallback to some other method (like HTTP). WebTorrent for example solves that particular problem well.

I could imagine them doing satellite-satellite comms to get around bottlenecks where they aren't over a connected ground-station, but for pre-emptive caching I find it unlikely (though not impossible)

Do you have a reference that confirms they are doing inter-satellite comms ?

(Twenty years ago I consulted on a system designed by a Russian satellite company - round one would have used stored and forwarded between remote user and ground station, and round two was going to use inter-satellite forwarding, which was horendously complicated given the ever changing routing tables as satellites zipped past each other in differently orientated orbits )

RubenKelevra commented 3 years ago

Each of these sattelites have more than 67 Linux computers on board. I doubt it's possible to run these with your quoted power available. 🤔

They plan currently to use lasers for the interconnections, but I don't think they're ready yet.

Stebalien commented 3 years ago

I doubt storing data in space will be an attractive option any time soon. The benefit of LEO satellites is low latency communication in remote places. It's much cheeper to just drop a bunch of large datacenters around the world and then transfer data on demand. A 2x bandwidth cost for not storing anything on the satellite is worth it.

However, one could use satellites to broadcast popular data then use content addressing to find nearby satellites/frequencies providing the data. But this would require new protocols.

bertrandfalguiere commented 3 years ago

I guess that is basically what they do for TV, where a large number of clients request the exact same bits at the exact same time. Another use-case would be firmware update of the antennas themselves. Other than that, I don't see why many clients would request the exact same bits at the exact same time. And they don't need IPFS for that.

I think sattelite ISPs can be useful to have a one-hop connection to a ground node near you (direct connection as seen from IPFS of course). But Sat ISP don't need to run sky IPFS nodes for that.

Having them run smart IPFS local DHT listing only nodes under a sattelite would help, though, as people would be able to quickly find nodes with the requested data one sat hop away from them. But it require a very large IPFS adoption before Starlink even consider maintaining such large DHT on every sat (with the additional problem that you have to constantly pass the info to the next sattelite, as it is a geographically bounded DHT).

Stebalien commented 3 years ago

Yeah, I'd see this mostly being useful for very large, very popular downloads. For example, game of thrones on the release day, live TV, etc.

The benefit of building this into IPFS is that you'd be able to fallback on bitswap/graphsync if no satellite link is available, you'd be able to find available satellites through content addressing instead of needing to trust some centralized service, and you'd be able to verify the data. You could always build a custom protocol and/or proprietary system to do the same thing, but integrating into an existing ecosystem is probably simpler in the long run.

bertrandfalguiere commented 3 years ago

I don't even see the use-case for the release of GOT. By "broadcasting" do you mean that the satellite emits a data stream to its whole geographic area, and clients (antennas) are free to pick it up or not? Just like satellites broadcasting TV and antennas are free to pick up channel A or channel B or none? If this is what you mean, then, GOT lovers are either steaming, or downloading the movie for later. If they are streaming they won't start at the exact same time, so how would that work? If they download it, how do they know when a cast will occur? The download for later use case will be rarer and rarer anyway as people like the convenience of streaming (start watching as soon as you decide, with the first frames on your screen instantly).

For streaming, maybe there would be 2 broadcasts: one of the whole movie, sending it in cycle, and one for the first minute, also sent in cycle. That way, the client is guaranteed to start watching soon as the second broadcast is always about to reset, and they can download the rest of the movie from the first broadcast while watching the beginning.

Anyway, I see a big problem for baking broadcasting at the protocol level: non-interactivity between clients and broadcaster, and heterogeneous clients. TV sat operators are able to broadcast because they are able to use a bandwidth that is tailored for their clients. Let's say I launch a Starlink competitor called SatStream. I use cheaper antennas that can take in 1Mbps. If Starlink streams at 5Mbps, it has no way to know there is slower clients below from a competitor. If it did and adapt to it, that would open an attack where you could simulate a slow client to slow the connection for everyone.

A broadcast is a one-size-fits-all in my understanding, so you have to have homogeneous clients below, or you are either overwhelm slow clients or underwhelm fast ones.

(Right? 🤔)

Stebalien commented 3 years ago

You're correct, and I agree with your looping approach. In terms of bandwidth, that should be pretty easy to overcome with a reputation system (easy compared to everything else here...) but I do agree this is way out there and not likely to be useful in the near future, if ever (given that bandwidth costs should go down over time).

bertrandfalguiere commented 3 years ago

Actually, to solve the different capabilities of clients, there could be 5 double streams, and clients chose the best they can.