project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.18k stars 1.9k forks source link

[Feature] Matter Casting support for "Audio Player Architecture" with a new"Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and adding "Basic Audio Player") + look into adding multi-room music streaming? #31389

Open Hedda opened 5 months ago

Hedda commented 5 months ago

Feature description

I hope it is OK to submit this large feature request that I do not have the skills or resources to implement this myself. As such, this is just a feature suggestion meant as an open letter to the Matter members for discussion, and not a feature proposal from me.

Anyway, the feature request is that I ask others to consider having a separate Matter architecture for an "Audio Player Architecture" and adding a new "Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and "Basic Audio Player"). That is, a universal audio-only casting standard for music services and other streaming audio sources, cast to speaker-only devices, (i.e. devices that are not designed to handle video playback, such as example smart speakers).

Also need an example similar to tv-casting-app but perhaps instead for audio-only casting of music, so speaker-casting-app or?

Perhaps also an example audio player app similar to tv-app but for music playback (preferably multi-room synchronization)?

Anyway, such "Audio Player Architecture" could be based on the existing Video Player Architecture, and the new "Casting Audio Player device type" could be based on the existing "Casting Video Player device type". However think there needs to be differences. Should try to aim to design an architecture primarily for music playback that works for a combination of "smart speaker", "home audio", and "high fidelity".

While some video-specific features could be removed if basing it on the existing "Video Player Architecture", I think it would be preferable to also extend a dedicated "Audio Player Architecture" with some audio-specific features to optimize for home audio setups with Hi-Fi quality amplifiers and speakers designed for music playback, and not solely for embedded smart speakers.

An alternative could be to redesign and rename the "Video Player Architecture" into a more generic "Media Player Architecture"?

An example feature is real-time audio synchronization between different smart speakers to allow for synchronized multi-room audio playback of music on several Matter Audio Casting enabled speakers installed in the same home, (also known as distributed audio system). This needs support for "Audio Group", usually named "Speaker Group" and perhaps also "Audio Zone" ("Speaker Zone" or area). Preferably also need to have separate volume controls for each speaker and/or zone to compensate for differences in apparent volume due to room size and shape as well as speaker products used in different rooms.

https://en.wikipedia.org/wiki/Multi-room_audio

Multi-room audio:

Another argument for a separate audio-only architecture and audio player specialized for just music playback could enable it to run with even fewer resources on constrained devices.

Background: The existing Matter specification does feature a "Video Player Architecture" with a "Casting Video Player device type" and "Video Player endpoint" ("Casting Video Player" and "Basic Video Player"). What looks to currently missing but is directly related is a "pure" audio architecture with Matter Casting Audio Player device type, and maybe an Audio Input cluster as well.

Product use case: A client/server design that works well for music/audio apps and smart speaker products, + products with audio line-in, (i.e. devices designed for only pure audio output and/or input that are normally used just music playback, including multi-room sound systems. Probably sometimes but not always including microphone input for voice assistant. The point is that it means products that lack any kind of video output like with video screens such as televisions and/or smart control displays/screens).

The main problem to solve: There are many different audio streaming protocols for commercial use, from basic to audiophile-class audio quality, and there are plenty more music streaming services around today that do not support all of those. Having plenty of proprietary and closed-sourced solutions from different commercials means fragmentation, audio players/receivers and music services that do not communicate with one another, and no way for users to control all of their music from a single interface or stream the audio to different ecosystems at the same time.

I think that other than the obvious smart speakers with voice assistants, another real-world market and use-case for pure audio players are high-quality speakers and Hi-Fi grade sound systems for music playback, whether or not they would be used for being as a single-point and multi-room sound systems for music playback, their primary audience would probably be users of music streaming from example streaming music from different commercial music services apps like Amazon Music, Spotify, SiriusXM, Pandora, Tidal, Qobuz, Deezer, YouTube Music, Apple Music, as well as additional audio-streaming services for other types of content (like example Amazon Audible for audiobooks) if and when they add support “Matter Casting” streaming protocol for audio to their apps.

If implemented, please be sure to include support for the concept of so-called "speakerless devices", meaning audio-output dongles (with TOSLINK and Phono AUX-out or line-out ports for external speakers and sound systems from third parties), such as Google's original "Chromecast Audio" product which enables adding Google Cast audio player capability to any third-party speaker / sound system, as well as "Amazon Echo Input", "Amazon Echo Link", "Echo Link Amp" which similarly also adds AUX output to third-party speaker / sound system (but Amazon Echo products also have embedded voice assistant via built-in microphones).

https://en.wikipedia.org/wiki/Amazon_Echo#Speakerless_devices

https://en.wikipedia.org/wiki/Chromecast#Chromecast_Audio

A popular example of Hi-Fi audio streamer products without a built-in voice assistant is the WiiM series from Linkplay Technology:

A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles. That is, audio-steaming server dongles with "line-in" and/or "microphone" input ports that basically work as stand-alone soundcards on the network act as embedded audio digitizer appliance devices for streaming "Matter Casting Client” of audio-only which can be streamed to any set "Audio Player endpoint", which can either be a single endpoint of a grouped endpoints (audio group) for multiroom music playback. This would allow a user to connect any legacy audio source, like an LP record player (phonograph turntable), cassette deck, or CD-player (for Audio-CDs) to such an audio-input dongle and stream that audio to any “Matter Casting” enabled audio player.

As far as I know there are no commercial products on the market, but check out this "Vinyl Cast" app as a proof-of-concept:

Platform

all

Platform Version(s)

No response

Anything else?

Perhaps an existing Matter group member would be willing to contribute their existing technology solutions as a base for audio grouping and synchronized multi-room audio support? If not the whole thing then perhaps parts of the specifications, patents on software for relative technologies.

Amazon Alexa features multi-room music support:

Google has "Google Cast" which supports multi-room audio with grouping of speakers and multiroom synchronized playback so maybe they could be convinced to contribute components?

Apple features multiroom support for AirPlay 2 audio streaming:

Espressif ESP-ADF (Espressif Audio Development Framework ) do support ESP Multi-Room Music but not synchronized on its own?

Sonos, perhaps the largest on the market for multi-room audio speaker systems, and is now at least a member of CSA today:

IKEA of Sweden AB currently has a partnership with Sonos to make Wi-Fi speakers with multi-rooms audio support:

Yamaha MusicCast (Yamaha is not yet a member of the CSA), however Yamaha MusicCast prove need for high fidelity quality:

Roon Ready (Roon’s RAAT streaming technology by RoonLabs), not CSA member but prove interoperability needed:

There are also other open-source and closed-source multi-room audio solutions for multi-room audio synchronisation. Example:

Snapcast

SlimProto & SliMP3 protocols for Logitech Squeezebox players (for Logitech Media Server, a.k.a. LMS/SlimServer, SqueezeCenter)

Strobe audio

Music Player Daemon (MPD)

PS: FYI, maybe relative is that last year Google won over Sonos in a patent infringement lawsuit about multi-room audio groups:

https://www.engadget.com/google-brings-back-smart-speaker-grouping-after-sonos-lawsuit-victory-081200931.html

bzbarsky-apple commented 5 months ago

@decenzo please take a look.

chrisdecenzo commented 5 months ago

Thanks for the suggestion. Please join Matter so you can volunteer to lead this effort!

Hedda commented 5 months ago

Sorry, I do not have the capacity for that myself. Would think that Amazon and/or Google might be best suited to look into this?

Again, both Amazon and Google have competing smart speakers with their own technology already implementing these features.

Apple also has the technology + use case with AirPlay and their HomePod series, but not as sure they would lead such a project(?).

Is there someone from Amazon who leads Matter's matching "Video Player Architecture" + "Casting Video Player" development?

@pgregorr-amazon or @sharadb-amazon can maybe refer Amazon leads to look into and consider this as a feature request?

Could this audio-only casting track perhaps be tackled there as an extension and continued part of that video casting project?

Referencing to Amazon driving "Matter Casting" for their video playback devices and them also having many smart speakers too:

https://www.aboutamazon.com/news/devices/amazon-ces-2024-announcements

https://www.theverge.com/2021/12/9/22824559/matter-tv-streaming-devices-smart-home-casting-protocol-support

https://www.theverge.com/2024/1/9/24030324/amazon-matter-casting-echo-show-fire-tv-prime-video

https://www.streamtvinsider.com/video/amazon-drops-matter-casting-capabilities-panasonic-and-fire-tv-os-partnership-ces

https://9to5google.com/2024/01/09/amazon-fire-tv-matter-casting/

https://www.aftvnews.com/matter-casting-is-coming-to-fire-tvs-and-the-echo-show-15-an-industry-first-by-amazon/

https://www.aftvnews.com/new-matter-casting-video-player-for-fire-tv-gets-certified/

Hedda commented 1 month ago

Please join Matter so you can volunteer to lead this effort!

@marcelveldt as "the Matter guy at Nabu Casa" (and Home Assistant) perhaps this is something that you and ESPHome developers at Nabu Casa would be interested in helping architecture and develop for the Matter project? I would think this functionality might be very relative to your roadmap now that Home Assistant's recent announcents about both your "Music Assistant" and the Open Home Foundation + the Home Assistant's voice assistant work, which are separate things that I believe all align in spirit with this concept at a high-level?

PS: I also read that Nabu Casa is developing your own ESPHome based smart-speakers (and/or smart-display) hardware, and for such devices to also work as music streaming and audio player (A/V-receiver endpoint) without your "Music Assistant" integration acting as middleware I am guessing you are eventually going to want to add support for some cross-ecosystem support for some kind of standardized audio streaming protocol like Matter Casting with audio player endpoint features?

Hedda commented 2 weeks ago

Any feedback or input on these feature request ideas about Matter Casting support fod audio-only streaming and multi-room support for syncronized music stream playback in multiple rooms?

https://community.home-assistant.io/t/matter-casting-matter-casting-client-support-in-home-assistant-cast-new-upcoming-open-protocol-standard-for-local-video-and-audio-streaming/671645

chrisdecenzo commented 2 weeks ago

There is an effort within Matter right now to define use cases for audio players / smart speakers. Please join us!

nagyrobi commented 2 weeks ago

A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles.

Regarding audio inputs, there's need for something much more simple: have the possibility to play local audio sources through the smart device. You don't want separate speakers for your TV and various players do you? Television sets built-in speakers generally suck, you'd love to hear TV sound on your new speakers but they lack a TosLink / ARC input or some RCA Line-Ins. You don't necessarily need to stream the sound of the TV through the network to the other rooms, but what you do need is that the sound of the speakers remains in sync with the picture so delay is added. Pretty simple to accomplish in hardware, actually:

https://forum.raspiaudio.com/t/suggestion-for-espmuse-multiple-analog-inputs/401 https://github.com/esphome/feature-requests/issues/1750 https://github.com/esphome/feature-requests/issues/1751 https://github.com/sle118/squeezelite-esp32/issues/227

Product designers have to start thinking in hardware too, not software-only approach.

Sonos soundbars have this; they can learn the Volume Up/Down IR commands of the TV remote (already feasible by ESPHome); nowdays TVs can be set to only output audio through TosLink / HDMI ARC. And volume can be adjusted from TV's remote.

The selling point here would be to have this speaker/preamp/player dongle integrated with the HA system and have eg. announcements mute/dim TV sound and restore it afterwards. Also support multiple sources like being an USB soundcard / Aux Line ins and stream sources and handle them the same way. When you turn on the TV, change to the TV sound source automatically.

Check the links above with POCs and use cases explained.

jonsmirl commented 2 days ago

There is an effort within Matter right now to define use cases for audio players / smart speakers. Please join us!

@chrisdecenzo There are groups of developers (myself included) who work on Matter projects but can't participate in CHIP because our employers won't join the CSA for various reasons. CHIP would benefit from having an 'Invited Expert' membership (like the W3C has) allowing individuals caught in this situation to participate. 'Invited Expert' would allow access to Slack and the draft spec with no voting ability.