watfordjc / csharp-stream-controller

My WIP stream controller for live streaming
MIT License
1 stars 0 forks source link

Add Tweet Sync Support #36

Open watfordjc opened 3 years ago

watfordjc commented 3 years ago

Feature Branch

Current feature branch for this issue: feature/issue-36/obs-1.

Progress

OBS Project

Twitter API Project

UI Project

Windows Audio Project

External Repositories

watfordjc/csharp-message-to-image-library

watfordjc/obs-shm-image-source


Background

With the work done on synchronising clock, weather, and slideshow changes, the only non-synced objects in OBS are the following:

My Tweet panel could be changed in a number of ways, and it uses a number of interconnected parts that I haven't published the code for. This issue is going to be a tad complex and require making decisions about whether or not to change my current backend.

watfordjc commented 3 years ago

Twitter Panel

I'm going to cover the panel itself and the data visualisation that updates the panel in this comment.

Panel Visual Quality

One of the reasons (other than synchronised changes) that I decided to made the clock/weather panel native in OBS rather than continuing to use a Web source is visual quality.

The Tweet panel quality has bigger issues. First, mIRC custom windows have limitations when it comes to /drawpic. It is the reason why my script to download Twitter profile images used the "bigger" quality rather than the original quality.

A picture showing mIRC versus OBS native is the best way of illustrating the constraints, so a screenshot from OBS full screen projector follows. On the left side is the OBS "Tweets" Windows Graphics Capture source. On the right are native OBS sources.

Side by side picture

On the left you can clearly see the image quality issues. The Twitter logo has aliasing and the edge of the profile picture isn't the roundest circle. The profile picture on the right, however, has a rather defined circle edge.

The profile image on the left is 400x400 pixels resized within mIRC to 48x48 pixels. The profile image on the right is 396x396 pixels resized within OBS to 60x60 pixels. The extra pixels are not the reason for the quality difference… I actually increased the display size to 60x60 pixels because I had a reason to actually download the higher/original quality profile images and the higher quality allowed me to consider a slight redesign.

After using Inkscape to convert the Twitter logo from brand resources from SVG to a 512x512 PNG (the brand resources PNG wasn't high resolution enough for me), it looks like this as a native OBS source resized to 60x60 pixels (it retains the Brand Guidelines content-free border area):

Blue Twitter logo with clean edges on a black background with the following white text to the right: Today, 19:10 UTC+1

mIRC Custom Windows, OBS Sources, and WPF .NET Core

This is perhaps going to be a strange thing to say, but mIRC, OBS, and WPF aren't all that dissimilar when it comes to displaying text.

WPF has some layout options, like XAML, that make dynamic content repositioning possible, and both mIRC and OBS Studio could have similar (to use an HTML+CSS phrase) block-level elements if there were demand for such a feature and someone wrote the code, but since mIRC custom windows aren't that widely used and people using OBS tend to use Web sources, such features are unlikely.

That leaves us with either hard-coding transform positions in OBS (such as for my new clock/weather group), or counting pixels. It is weird to be talking about doing layouts by pixels in 2020, but as the WPF MSDN page on System.Windows.Media.FormattedText puts it, we're talking about "low-level control for drawing text".

The following code block is extracted from part of my /wintweet alias in mIRC. Although OBS text extents are different to drawing text line by line, you still need to know the pixels.

  var %lines = $wrap( $+ %Stext, Segoe UI Emoji, 20, $calc(%winWidth - 30), 1, 0)
  var %i = 1 

  inc %startHeight %lineHeight

  while (%i <= %lines) {
    drawtext -pn @Tweets 0 "Segoe UI Emoji" 20 10 %startHeight  $+ $strip($wrap( $+ %Stext, Segoe UI Emoji, 20, $calc(%winWidth - 30), 1, %i))
    inc %i
    inc %startHeight %lineHeight
  }

The %startHeight variable is equal to the y position of where we want to start drawing. The %lines variable is set to how many lines of text the value of %Stext would take up if it were bold 20 point text in the font Segoe UI Emoji and was limited to a line length of the window width minus 30 pixels.

As for %lineHeight, that is an annoyance with Twitter's preferred Windows font being Segoe UI (brand guidelines Tweet Treatments say to use Helvetica Neue, but even Twitter.com uses Segoe UI on Windows). It has awful line spacing resulting in wasted space between lines. We can't control it unless we draw each line individually, giving it an absolute y position for the top of the text. OBS has the same issue as it has no line spacing options.

  ; Segoe UI has dodgy line heights.
  var %textHeight = $height(The quick brown fox jumped over the lazy dog, Segoe UI, 20)
  var %lineHeight = %textHeight * 1.0

This wouldn't be an issue if I were using a "normal" font like Open Sans, but this leaves a dilemma. I can either accept the Segoe UI line spacing for the Twitter display name and username if they line wrap (Segoe UI Emoji doesn't seem to have the issue, but it doesn't have bold), or I can ditch the idea of using native OBS sources and do something like creating an image for each Tweet to be displayed instead.

In either case, the pixels need counting.

Looking at the brand guidelines and Tweet treatments, I'm not sure if the font is an actual requirement. I am complaining that their chosen font is hard to read and even breaks their brand guidelines on spacing if the username has to word wrap, so I think I'll switch to Open Sans.

Profile Image Downloading

Profile images are downloaded in a rather convoluted way. mIRC checks to see if the (round) image exists, and if it doesn't it calls an alias that uses COM to run a WScript to silently call (WSL) Ubuntu.exe with a -c parameter for a bash script with a command line parameter of the Twitter username.

Ubuntu then calls my modified version of tweet.sh (I added a get-profile-image-from-screen-name command, don't think I've committed that to my fork yet), and after a bit of regex on the URL and a curl request it uses ImageMagick's convert command to turn the image into a circle.

Both the original and processed images are then stored in mIRC's folder, the command returns, and mIRC adds the Tweet to the display queue.

Tweets

This is, perhaps, the biggest question that needs deciding upon. At present, my Raspberry Pi 4B has a long-running Python script (using TwythonStreamer from twython) running that is connected to the Twitter Streaming API. As it streams Tweets related to UK emergency advice, it has been dubbed UKEM.

After some processing, it spits out a Tweet to a number of UNIX Domain Socket that each relay the Tweet to a different service.

if reply_text:
        xmpp_message = "%s (@%s) %s:\n\n%s\n\nView Tweet: %s" % (
                data['user']['name'], data['user']['screen_name'],
                reply_text, text, url)
elif not retweet:
        xmpp_message = "%s (@%s) Tweeted:\n\n%s\n\nView Tweet: %s" % (
                data['user']['name'], data['user']['screen_name'],
                text, url)
        irc_message = "\x02\x0311,01 %s \x03\x02 (\x02@%s\x02) Tweeted:\x0305 %s \x03| %s" % (
                data['user']['name'], data['user']['screen_name'],
                text.translate(irc_translation_table), data2['id'])
        twitch_message = "Link to Tweet from %s (@%s): %s" % (
                data['user']['name'], data['user']['screen_name'], url)
else:
        xmpp_message = "%s (@%s) Retweeted %s (@%s):\n\n%s\n\nView Tweet: %s" % (
                data['user']['name'], data['user']['screen_name'],
                data2['user']['name'], data2['user']['screen_name'],
                text, url)
        irc_message = "\x02\x0309,01 %s \x03\x02 (\x02@%s\x02) Retweeted \x02\x0311,01 %s \x03\x02 (\x02@%s\x02):\x0305 %s \x03| %s" % (
                data['user']['name'], data['user']['screen_name'],
                data2['user']['name'], data2['user']['screen_name'],
                text.translate(irc_translation_table), data2['id'])
        twitch_message = "Link to Tweet from %s (@%s): %s" % (
                data2['user']['name'], data2['user']['screen_name'], url)

twitch_message is what gets relayed to Twitch chat via IRC. irc_message is what gets relayed to a DALnet and Freenode channel, with my mIRC bot connected to DALnet. xmpp_message is what gets relayed to XMPP.

One of the problems with IRC is the character limit. My mIRC bot was originally also posting the links to Tweets to Twitch chat, but if the combination of display name, profile name, and Tweet text was too long, the ID of the Tweet got truncated.

There are usage limits for the Twitter API, and it would be rather wasteful for me to have two Streaming API connections for receiving the same data. These are the options I'm looking at:

  1. Create a server on my Pi 4B that can relay the Tweets over a TCP socket.
  2. Start implementing the Twitter API in the application.
  3. Connect to one of my IRC or XMPP relay points much like my mIRC bot.

To give the Twitter bit of this application's original network design diagram a textual representation:

In my original design plan I pencilled in option 1 and retained UKEM. The whole reason the Tweet's received by UKEM are relayed over several services is because if there were an emergency in the UK (such as a pandemic) and Twitter went down, how resilient are the resiliency people's communications channels to the public?

If I brought all the functionality of UKEM into this application, whenever the program isn't open or I reboot my laptop all those Tweets stop getting relayed. As well as keeping existing relaying working, I also need to consider that others might use the application and want to use a different Twitter list.

I'm not yet sure how I'm going to get the Tweets into the app, and the reason I'm using twython is because I didn't want to do the Twitter API stuff from scratch.

watfordjc commented 3 years ago

Initial Decision: Image Generation

Given most of my code is for the command line, the typical process for me when programming is getting something akin to command line output before even thinking about the GUI.

That is most evident with the first code commit of this repository, commit fd56f72. It had two windows: one for displaying audio interfaces, and one for displaying raw obs-websocket JSON. The window that shows on application launch at present wasn't added until later.

Connecting to obs-websocket came before processing obs-websocket messages and errors, which came before acting on those messages and errors, which came before graphically representing that information.

There is a reason I am now looking at working from the opposite direction (GUI first), however: colour emojis.

Colour Emojis

Tweets can contain emojis. I am displaying Tweets. Continuing to use single colour emojis after spending time improving the visual quality of the Tweets panel just doesn't make much sense.

There is, however, a problem: software support. OBS uses GDI+, and GDI+ doesn't support colour emojis. In the last comment I wrote about processing text line by line when updating the mIRC Tweets window, and how both OBS and WPF aren't that dissimilar in that respect if working with dynamically sized content.

When I was looking at the create an image route, however, I considered the colour emoji issue. There are only two real options available:

  1. Using the UWP Glyph control with XAML islands.
  2. Using Direct2D.

Option 1 looks a bit too involved (and possibly too heavy) just to add some properties missing from the WPF Glyph control, so I'm leaning towards option 2.

The closest I have been to Direct2D is back when I dabbled with XNA, but IIRC even then I was only doing higher level C# stuff.

Direct2D

Direct2D is a 2D DirectX thing:

Direct2D is a hardware-accelerated, immediate-mode, 2-D graphics API that provides high performance and high-quality rendering for 2-D geometry, bitmaps, and text. The Direct2D API is designed to interoperate well with GDI, GDI+, and Direct3D. —Purpose, Direct2D, microsoft.com

This application is already using APIs that are only in Windows 10, and I'm unlikely to make it cross-platform, so using something that requires DirectX isn't an issue.

As I have been using a mIRC custom window, the position of generated Tweet images is simple. In 1080p, the Tweet panel is 320x900 pixels and is positioned at (0, 0). If the image doesn't include the Tweet panel heading and sub-heading, the position moves down a bit on the y axis and isn't as tall.

Recordings in OBS are configured at 1080p, with Twitch streaming currently configured at 720p. Despite having a 1440p monitor, I mostly game at 1080p, although I might switch to 1080p60 on Twitch if I reach the point of having server-side transcoding available to me.

Anyway, in terms of pixels on one axis, 1080p is 1.5x 720p, and 1440p is 2x 720p (or 1.333x 1080p). I believe 4320p is the smallest resolution that can be divided by 720p, 1080p, and 1440p, and 2160p by a whole number. As 4320p is 4x 1080p, that'd make the 'skyscraper' image size (with header) 1280x3600 pixels.

I'm not sure if evenly divisible pixel counts matter when downscaling, but as I am using two different resolutions in OBS I am going to start with a "canvas size" of 1280x3600 for Direct2D. I can certainly say that downscaling some NHS pandemic banner images from PDF source files results in a better quality in OBS than just using the PNG banner images.

Using a Web source would be a lot simpler, but over the last few years browsers have become increasing RAM hungry and I doubt obs-browser is going to be as efficient as generating images. My laptop has a strange hardware configuration when it comes to graphics, so I'm not sure if I'll be able to use my internal NVIDIA GPU whilst using an eGPU or even if the Intel graphics are available. Using a different GPU to the one that is being used for gaming and streaming could have a minuscule performance benefit, if possible.

First Steps

I want to create an image. I am effectively going to be doing graphic design in C++. I need some design goals based on what I need to do to use Direct2D and what I want to eventually create using Direct2D.

To start, I need to work out how to do Direct2D stuff in a WPF .NET project… for that I have created a new repository to do some experimentation: csharp-message-to-image-library.

Tweet Panel and Colour Emoji

On the left, a vertically displayed Tweet with white text on black background - the profile image and Twitter logo have visible aliasing. The Tweet text also contains a monochrome emoji. On the right, a big colour emoji inside a white square with a dark grey border, on top of a white background.

I have made some progress, and am gradually learning how to do what I want in Direct2D and DirectWrite.

Displaying colour emojis, as well as displaying my square profile image with a round border, are the most recent things I've worked out the code for.

Completing the Design

Looking at my current Tweet panel, I can now list the likely things I need to implement on the Direct2D/DirectWrite side of things:

Design is Getting There

A Tweet from Herts Fire Control, with a clearer profile image than my current Twitter panel, no aliasing on the Twitter logo, and Tweet text containing colour emojis (fire engines and a grinning face).

There are two remaining things on the visual design side of things:

Vertically aligning text to an image is something I have never found simple, whether in Word, HTML/CSS, Visual Studio, or GIMP. What I'm going to need to learn, however, will potentially help with the matter.

Images have transparent areas which can throw off measurements. Fonts have different attributes and some of them lie about them making programatic alignment buggy.

Repositioning Text

In commit watfordjc/csharp-message-to-image-library@d8afb34 I switched from a return type of HRESULT for DrawTextFromString() to a return type of int, squashing all HRESULT errors into -1. The int return type was the height of the drawn text.

As return types have to be blittable, there are two options:

  1. Return an array of a single type, such as int.
  2. Pass a struct as a parameter and have the C++ code populate it.

At first I was going to do it the first way, but then I decided returning HRESULT so appropriate exceptions can be raised makes more sense. I can populate the struct as the values are being created in C++, and split the function into CreateTextLayout() and DrawTextLayout() (is using the same names as DirectWrite functions a problem)?

My current struct that hasn't been optimised for data type size yet, is as follows. I have adjusted some of my code to start using it. The variables lineHeight and lineHeightEm are currently unused (never set).

struct TextLayoutResult {
    IDWriteTextLayout* pDWriteTextLayout;
    int top; // Top edge of text block from edge of canvas
    int left; // Left edge of text block from edge of canvas
    double height; // Height of text block
    double width; // Width of text block
    float lineSpacing; // Line-spacing for font used
    float baseline; // Baseline for font used
    double lineHeight;
    double lineHeightEm;
};

Looking at my current C++ code, the only place I use the (startX, startY) position to position the text is in the final DrawTextLayout() call:

cpp
pD2D1DeviceContext->DrawTextLayout(
    D2D1::Point2F(startX, startY),
    textLayoutResult->pDWriteTextLayout,
    pD2D1SolidColorBrush,
    D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT
);

Although the call to CreateTextFormat() uses the width and height of the maximum area it can take up from the top left pixel's coordinate, actual positioning of the text isn't done until the call to draw it.

hr = pDWriteFactory->CreateTextLayout(
    text,
    len,
    pDWriteTextFormat2,
    width,
    height,
    &textLayoutResult->pDWriteTextLayout
);

I think I know how I'm going to tackle this.

watfordjc commented 3 years ago

Fonts

This is the closest I have ever worked with fonts, and when I do work with fonts (such as in Web design) I spend ages adjusting CSS until it looks close to how I want.

Things are different this time though. I don't want to keep adjusting things until I get something approximate to how I want it to look, I want to programatically, deterministically, layout my text so that I can just say, for example, "100 pixels" and no matter what font, what text, or any attribute of the text or font, I can zoom right in, count some pixels, and after a bit of maths get to the answer "100 pixels".

As this is literally going to be a tiny bit of code in a tiny library that once written I am unlikely to touch again unless to tweak something or use a new API, it makes sense to make a record of the process regardless my memory issues. So, fonts…

Text as Maths

Maths is good. Computers speak maths. What is the maths of text?

The em

The em is something I know from Web design, particularly given my use of ASCII even in UTF-8. An em is the size of an M, and an en is the size of an N, right?

No, not these days.

These days 1 em equals the point size. So, for some 14 point text, 1 em is 14 points long (a 1 em square would have a width of 14 points and a height of 14 points).

The Inch

The Pixel

Physical Pixels

A pixel is a physical thing or a collection of physical things. On a typical LED or LCD display, a pixel is a group of 3 things: a red thing, a green thing, and a blue thing. There might also be a white thing, and the 3 or 4 things might be replaced by a single RGB or RGBW (red, green, blue, white) thing. How they are aligned varies by the technology and components used.

In digital images, a pixel is the digital representation of that physical thing. A pixel can only be one colour and it cannot be subdivided. The number of colours it can be depends on how much space is given to the storage of its component values.

R, G, and B are typically given 8 bits of space each (although HDR has started making 10 bits more widely used) and A (alpha/transparency) is typically given the remaining 8 bits of a 32-bit word (although whether A comes before or after RGB varies). Some formats squeeze a pixel into 12 bits (4 bits for each of R, G, and B).

As I'm not dealing with HDR, the pixels in the images my software creates are going to be 32-bit (24-bit RGB + 8-bit A) sRGB PNGs.

The pixel's paper equivalent is the dot. In some places we are going to talk about DPI (dots per inch) when we should really be talking about PPI (pixels per inch), but that is just a language thing we're going to have to deal with.

Astral Projection Pixels

On Android they are called DPs (pronounced dips) and on Windows they are called DIPs.

Display-Independent Pixels (the D can also be Device or Density) are an equivalent to pixels that take a screen's pixel density into account so that an inch of pixels equals an inch of pixels.

On Windows, 100% scaling has one pixel equal one DIP. 100% scaling means 96.0 DPI.

On Android, the default (medium-density, or mdp) screen is 160.0 DPI. Android tends to use SPs (scalable pixels) over DPs for text sizes, with 1 SP equalling 1 DP at standard settings (i.e. default font size, no zooming).

As they are independent of display, and displays are what contain the pixels, these are pixels that aren't really there. You need a DPI to convert between pixels and DIPs, even if you're just using the default.

The Point

A (desktop publishing) point, ignoring DPI and pixels, is 1/72".

To convert from points to pixels, you can either take pixel density into account or ignore it. But we're now at the point where we can divide DIPs, PPIs, DPIs, and DPPs (points) by inches. Goodbye distances, hello fractions.

If we use the default pixel density on Windows, then a point is 96/72 pixels.

Ergo, at standard pixel density, 1 em at 14 points equals (96/72)*14 pixels = 18 ⅔ pixels = 19 pixels (mandatory rounding as we're talking pixels).

My current Windows settings are at 125% scaling, so depending on the software and what it does with that information (if anything) my display is 120 DPI.

At 120 DPI, 1 em at 14 points equals (120/72)*14 pixels = 23 ⅓ pixels = 23 pixels.

As my software isn't going to be drawing to a screen, it doesn't need to care about pixel density. Therefore the pixel to point multiplier is 1 ⅓.

The em Revisited

With the formulae boiled down to a multiplier, we can now convert 1 em into pixels:

The em is used for various things typographical.

The Imaginary Lines

There are a number of lines used in typography:

Leading

Leading is pronounced the same as those leading strips for windows, as both typesetters and leadlighters used lead AKA chemical element pb.

Leading is the difference between one baseline and the next. Time for another formula:

leading = baseline + descender + gap 1 + gap 2 + ascender

Gap 2 is the gap above the ascender. Gap 1 is the gap below the descender. Obtaining these values requires getting the metrics for the font being used, but I haven't spent much time trying to work out how to get a font face from the name of a font.

To Be Revisited Later

For now I have decided to ignore uneven leading and line spacing. A combination of DWRITE_PARAGRAPH_ALIGNMENT_CENTER and some maths on the height of the text compared to the height of rectangle I want it vertically centred within are OK approximates for now.

watfordjc commented 3 years ago

TextFormatter Renamed to MessageToImageLibrary

MessageToImageLibrary is the new name of TextFormatter, and its repository has likewise been renamed from csharp-text-formatter to csharp-message-to-image-library.

The rename is because I am now at the point where I am thinking about abstraction, and at its core there isn't a single Tweet or Twitter in Direct2DWrapper.

The current class TweetPanel.cs, whilst being designed specifically for Tweets, is going to undergo a bit of a rename process so that it is more generic. Its purpose will still be to generate images of Tweets for my OBS streams, but calling something TwitterLogoFilename when it could just as easily be called a more generic NetworkLogoFilename doesn't make sense.

As an example of possible renaming decisions, there is an enum in TweetPanel.cs for all the elements that currently make up an image:

public enum CanvasElement
{
    CANVAS = 0,
    HEADER = 1,
    SUBHEADER = 2,
    HEADING_SEPARATOR = 3,
    TWEET = 4,
    PROFILE_IMAGE = 5,
    DISPLAY_NAME = 6,
    USERNAME = 7,
    TEXT = 8,
    TWITTER_LOGO = 9,
    TIME = 10,
    RETWEET_LOGO = 11,
    RETWEETER_DISPLAY_NAME = 12,
    RETWEETER_USERNAME = 13
};

To make things more generic, I'll probably rename as follows:

It is a bit more involved than that though because there are a lot of variables and properties that will need renaming.

watfordjc commented 3 years ago

Turning MessageToImage into a Library

Right now MessageToImage is a WPF .NET Core application project.

The next step is to further refactor the code so that it can be a WPF .NET Core library instead.

I've had trouble going in the opposite direction (turning a library into an application) before, so I expect this is going to give me a bit of grief.

Perhaps the biggest issue is going to be working out how to reference the libraries and packaging them.

There is also the matter that all calls to UnsafeNativeMethods are calls to internal extern methods. Such direct calls won't work when including the C# library at the current visibility level.

Library Release

MessageToImageLibrary version 0.1.0 has now been released. It follows the same pre-1.0.0 version numbering scheme as the libraries in this repository.

Feature Branch

Converting MessageToImageLibrary from a WinExe to a Library involved having to move a big chunk of the existing code out of the repository during refactoring as it was more suited to an implementation of a class rather than a generic class.

That code migration was a suitable point for a feature branch in this repository to be created. As I am only going to be working on the Stream Controller to obs-websocket side of things for the time being, this issue has now been assigned to this repository's OBS GitHub Project.

The initial imported instance code contains a hard-coded sample Tweet. Upon running the executable from the feature branch an image of the sample Tweet will be generated in the %TEMP% directory, upon which file explorer will be opened and the file selected. The line of code can be commented out if it causes an issue.

watfordjc commented 3 years ago

Message Image Generation and Display

This is going to require a redesign, both in the way Tweets are displayed in OBS and also how the message and display queues work.

Queue System

The current mIRC code includes a rather flaky queuing system based on multiple token variables — issues occur when the token lists get out of sync.

The following asynchronous things need to be considered:

Tweet Date and Time

The time a Tweet was authored is not available in mIRC due to the IRC character limit.

The current display workaround is to display the date (sans time) a Tweet or Retweet message was relayed through IRC.

In the case of a Retweet, only the Retweet time is displayed (because there is no timestamp, the time a Tweet was authored could be years before a relayed Retweet).

Tweet Input

mIRC currently fires off a request for a Twitter profile image when it receives a new Tweet (or a Retweet if there isn't a backlog).

When that request is completed, the Tweet gets added to the display queue.

As a stopgap measure for receiving new Tweets, Stream Controller could have some form of input API. Although I am using Unix Domain Sockets on my Pi for relaying Tweets to different services, I am not sure what options are available for inter-process communication in Windows.

An IPC solution, even if temporary, will need to be decided upon as the only way to check Tweet display is working properly is by not having hard-coded Tweets being cycled through.

mIRC Options

Looking at what options mIRC has available for IPC to other processes, there are the following possibilities:

I was going to dismiss using the Files option, but then I thought about the log files mIRC is already creating. Instead of outputting data to a file, using existing log files is actually a simple possibility that completely obviates needing to actually write any mIRC Script.

  1. mIRC UKEM writes to IRC log files.
  2. Stream Controller notices my bot has relayed a new Tweet.
  3. Stream Controller looks for the profile image file.
    1. If the profile image exists, it uses it.
    2. If the profile image doesn't exist, it will get stored in one of two possible filenames ($mircdir/twimg/tmp/PROFILENAME.{jpg,png}.
    3. If the profile image doesn't exist after attempting to retrieve it, the default (egg) profile image gets used. mIRC waits until the command returns… dash calls tweet.sh and then curl without any timeout settings so defaults are used… tweet.sh uses curl… the only relevant default is curl's default connect-timeout setting of 5 minutes. I suppose I can ignore such Tweets as this is only a temporary way of receiving Tweets in the application.
  4. Stream Controller handles the Tweet.
[09:29] <UKEM-Bot> 09,01 Home Office  (@ukhomeoffice) Retweeted 11,01 Hestia  (@Hestia1970):05 Today we launch the #EveryonesBusiness Advice Line, a new resource for employers to give them advice on responding effectively to disclosures of domestic abuse from their employees.  We know that when businesses take action, it saves lives.  Read more: https://t.co/Sm45i9S7TD https://t.co/BxCSXBeONo | 1301400754989785088

Horizontal scroll avoidance: [09:29] <UKEM-Bot> 09,01 Home Office  (@ukhomeoffice) Retweeted 11,01 Hestia  (@Hestia1970):05 Today we launch the #EveryonesBusiness Advice Line, a new resource for employers to give them advice on responding effectively to disclosures of domestic abuse from their employees. We know that when businesses take action, it saves lives. Read more: https://t.co/Sm45i9S7TD https://t.co/BxCSXBeONo | 1301400754989785088

One advantage of the mIRC logs is that the IRC messages my bot sends are formatted with a combination of readability and parsability in mind. Dark red text ^K05 is the start of a Tweet's text, for example, with ^O the end.

IRC Log Regex

The Tweets relayed to DALnet by my bot, UKEM-Bot, are of a specific format to make parsing via regex in mIRC Script simpler. Colours and bold are also used so that who Tweeted/Retweeted and the Tweet text are easier to parse visually.

Using an online regex tester, I have actually managed to boil all of the regex expressions I am using in mIRC into one single expression for C#:

private static System.Text.RegularExpressions.Regex ircLogTweetRegex = new System.Text.RegularExpressions.Regex(@"^\[(.....)\] \<(.*)\>( \u0002\u000309,01 (.*) \u0003\u0002 \(\u0002(.*)\u0002\) (.*) )?.*\u0002\u000311,01 (.*) \u0003\u0002 \(\u0002(.*)\u0002\).?(.*)?:\u000305 (.*) \u0003\| (.*)$");

Horizontal scroll avoidance: private static System.Text.RegularExpressions.Regex ircLogTweetRegex = new System.Text.RegularExpressions.Regex(@"^\[(.....)\] \<(.*)\>( \u0002\u000309,01 (.*) \u0003\u0002 \(\u0002(.*)\u0002\) (.*) )?.*\u0002\u000311,01 (.*) \u0003\u0002 \(\u0002(.*)\u0002\).?(.*)?:\u000305 (.*) \u0003\| (.*)$");

I didn't think it were possible to merge the regex for Tweet and Retweet into a single expression, but it turns out group numbers are fixed—if an optional group doesn't match, it still retains its group number.

A note on IRC control codes:

You'll note that my bot's IRC messages toggle control codes off in the reverse order they were toggled on, a bit like closing tag order in XML. A ^K is also followed by either another control code or a space because of the possibility of an expansion of the number of IRC colours beyond 16.

mIRC also supports ANSI colour escape sequences, but I don't think I've ever seen them being used on IRC in the last 24+ years.

Regex Groups

Group 0 is, like with argv[0] in command line argument arrays in some languages, the entire match. The value of all other groups are contained within Group 0.

Group 1 is the first group. It is the ..... in ^\[(.....)\]. In the previous example, the value of Group 1 would be 09:29.

Group 2 is the .* in \<(.*)\>, and in the example the value would be UKEM-Bot.

Group 3 is the optional Retweeter group. It is the entirety of \u0002\u000309,01 (.*) \u0003\u0002 \(\u0002(.*)\u0002\) (.*). As the example is a Retweet, its value would be ��09,01 Home Office �� (�@ukhomeoffice�) Retweeted.

Group 4 is the first sub-group of Group 3. Its value in the example would be Home Office.

Group 5 is the next sub-group of Group 3. Its value in the example would be @ukhomeoffice.

Group 6 is the next sub-group of Group 3. Its value in the example would be Retweeted.

Group 7 is the next group. As the Tweeter is bold cyan text on a black background (\u0002\u000311,01) regardless if it is a Tweet or Retweet, Group 7 will always be the display name of the author of the original Tweet. In the example above, that would be Hestia.

Group 8 is the .* in \(\u0002(.*)\u0002\). Its value in the example would be @Hestia1970.

Group 9 is the second optional group, and is the .* in .?(.*)?. In the example above, it has no value. The optional single character (.?) before the group also has no value.

This is the natural language design decision I made when I created the message format:

In a Retweet, Group 6 has the value of Retweeted and Group 9 has no value. In a Tweet, Group 6 has no value and Group 9 has the value of Tweeted.

Group 10 is the .* in \u000305 (.*) \u0003. Colour 05 (dark red) is the colour I chose for the text of a Tweet (black wasn't as visually parseable). In the example, its value is Today we launch the #EveryonesBusiness Advice Line, a new resource for employers to give them advice on responding effectively to disclosures of domestic abuse from their employees. We know that when businesses take action, it saves lives. Read more: https://t.co/Sm45i9S7TD https://t.co/BxCSXBeONo.

Group 11 is the final group and also the last characters of the IRC message (ignoring line breaks). It is the .* in \| (.*)$ and in the example has the value of 1301400754989785088.

Although I won't be using Group 11 when creating images of Tweets, my Twitch bot does use the Tweet ID to recreate the permalink URL for a Tweet—username doesn't matter as Tweet ID's are globally unique.

https://twitter.com/any_username/status/tweet-ID in the above example gives a permalink of https://twitter.com/Hestia1970/status/1301400754989785088 but you can use any username to get to the Tweet such as https://twitter.com/twitter/status/1301400754989785088 or https://twitter.com/ukhomeoffice/status/1301400754989785088 – my IRC topic says "Due to character limit, Tweet IDs are at the end of each message - visit URL by adding the ID to the end of https://twitter.com/twitter/status/"

Generated Images from IRC

I am currently testing the automated conversion of IRC log messages into images of Tweets.

This video contains around 100 such images each being displayed for 5 seconds:

YouTube Video: Recreating my Streaming Tweet Panel - Part 2

There is also the previous video in the playlist which is relevant to this thread. It takes the earlier images output during my Direct2D testing to create something a bit like a timelapse of this issue:

YouTube Video: Recreating my Streaming Tweet Panel

There are a few bugs in the image generation I have yet to work out, but that can come later.

Queue System

For now I have decided to use the existing chrono timers for displaying Tweets combined with a simple Queue.

private Queue<QueuedDisplayMessage> tweetImageQueue = new System.Collections.Generic.Queue<QueuedDisplayMessage>();
private class QueuedDisplayMessage
{
    public string filename { get; set; }
    public string speechPrepend { get; set; }
    public string speechText { get; set; }
}

Image generation returns a string with the value of the image's filename. That then gets turned into a QueuedDisplayMessage with some data for currently unimplemented text to speech. In the example I've been using, speechText would be the value of the tweet Text (Group 10) and speechPrepend would be (using the current uncommited code) "Home Office Retweeted Hestia1970: ".

The QueuedDisplayMessage then gets enqueued, and every 10 clock seconds a message gets dequeued and the OBS image source's filename gets changed via obs-websocket. If the queue is empty, then the blank base message panel (including headers) created during instantiation of the message panel gets displayed.

My current uncommitted code recreates the message panel instance if there is an error. As the blank panel image gets deleted on disposal, it is currently noticeable when the panel instance is recreated as the headers temporarily disappear.

One of the issues is if the profile image for a Retweeted message doesn't exist (the profile images of all users in the Twitter list are already downloaded). As I am now at the point of testing the replacement of the mIRC custom window, I can consider moving the profile URL acquisition WSL call and subsequent downloading to within the project.

Genericism

The process of transforming the Direct2D wrapper into something generic has also made other display formats a possibility.

The Twitter Display Requirements mandates how Tweets should be displayed, including things such as a rounded border and defining x as the height of the Twitter logo. It would be completely feasible to create a TwitterTreatmentPanel class that spits out perfectly formatted text-only Tweets following the display requirements—dynamic sizing might require starting with a transparent canvas background until the element text layouts are generated.

A lower third format is also a possibility I am considering.

There is also the possibility of a certain politician's Tweets having an emoji-inspired background colour (💩) and the Tweets being defecated by a bull. That will require animation though, and I haven't done much of that, although creating the animations in Direct2D/Direct3D is something I could look into.

Something I have previously whined about on my blog is that the ideal way of turning Tweets into something first party (no cookies) and undeletable is too arduous: image maps. When I eventually integrate the Twitter API, it shouldn't be too difficult to write the code for turning a Tweet URL/ID into an image map, particularly as I also intend to look into support of Twitter cards, images, and videos.

Next Steps

The next things on the to do list:

watfordjc commented 3 years ago

Downloading Profile Images

For now I have decided to continue doing things the way I have in mIRC.

I have, however, now commented the mIRC script and alias code that was being used for the Tweet Panel and downloading profile images.


Disclaimer

I would not suggest doing anything in this comment.

Integrating Twitter API access into the application is something I am planning on investigating, so the C# methods and the backend scripts being used at present may get completely replaced by the time the feature branch gets merged into master.


The current feature branch for this issue has the following hard-coded dependencies and file paths in order for the Tweet to image processing to work:

  1. mIRC running and saving logs to the directory G:\Program Files (x86)\mIRC\logs.
  2. mIRC logging set with the following options:
    1. ✓ Timestamp logs: [HH:nn]
    2. ✓ Line colors
    3. ✓ Include network
    4. ✓ Date filenames: By Week
  3. mIRC logs being created for the #UK-Emergency-Advice channel on the DALnet network.
  4. The path G:\Program Files (x86)\mIRC\twimg\tmp existing.
  5. The current Windows user being named John and WSL2 ubuntu.exe installed in C:\Users\John\AppData\Local\Microsoft\WindowsApps\ubuntu.exe.
  6. My watfordjc-patch-2 branch of tweet.sh cloned to /opt/tweet.sh in WSL2 Ubuntu.
  7. WSL2 Ubuntu having a user called thejc, with the following dash script saved at /home/thejc/Scripts/tweets/get-profile-image.sh
#!/bin/sh

if [ -z $1 ]; then
        echo >&2 "Usage: $0 username"
        exit 1
fi

cd "/home/thejc/Scripts/tweets"

## High Quality Images
URL=$(/opt/tweet.sh/tweet.sh get-profile-image-from-screen-name "$1" | sed -E 's/_normal\.(.*)$/.\1/')
EXTENSION=$(echo "$URL" | rev | cut -d'.' -f1 | rev)

curl --request GET \
 --url "$URL" \
 -z --compressed -s \
 --output "/mnt/g/Program Files (x86)/mIRC/twimg/tmp/$1.$EXTENSION"

There is the further requirement that tweet.sh configuration file tweet.client.key be configured for the Twitter API. There are various places such a file can be stored, I have opted for the same location as the script: /home/thejc/Scripts/tweets/.

tweet.client.key will need configuring in the following format:

MY_SCREEN_NAME=
MY_LANGUAGE=
CONSUMER_KEY=
CONSUMER_SECRET=
ACCESS_TOKEN=
ACCESS_TOKEN_SECRET=

In my case, MY_SCREEN_NAME=WatfordJC and MY_LANGUAGE=en-GB.

If you have already created an app in Twitter Developers, click on the Keys and tokens tab and if necessary generate/regenerate the keys.

When creating the access token, you'll want to choose suitable permissions for what you're going to be using tweet.sh for.

If you haven't already created an app using Twitter's old method, you will need to use Twitter's new Developer Platform which I think requires creating an account and choosing a use case for your API usage.

Other than a cursory glance, I haven't experimented with Twitter's new development portal. Twitter are also creating a new v2 API that looks like it will make my current UKEM bot's method of streaming new Tweets in a Twitter list less bandwidth heavy.

It should be noted that all uses of the streaming API require API credentials, and there are usage limits for the basic tier (currently 500,000 Tweets per month per project).

watfordjc commented 3 years ago

Text To Speech (TTS)

I have been using text-to-speech many years for IRC, going all the way back to Microsoft Agent 2.0 and possibly before (mIRC 5.7 released in February 2000 added MSAgent support for Windows 95 and later).

Microsoft Agent died though, and things have moved on. This is where I need to catch up.

Emulating my timing for Tweet display currently used in mIRC is going to involve some maths as there doesn't seem to be an equivalent to a single threaded "speech playback has finished".

Timing

When it comes to text to speech, there are different types of timings involved. This section is going to be broken down into two sections: Display Timing and Speech Timing.

Display Timing

After a lot of variable value tweaking, I am currently testing with the following rules:

In mIRC I had a number of timers, one of which (timerWinTweetClear) was used to wipe the Tweet area of the custom window. The variable %clearingWindow was set to 1 after a Tweet's minimum display period was reached so that a new Tweet could be displayed immediately and the window clearing timer would be reset.

Because mIRC aliases and remotes have some scripting limitations, there was also some recursion. For example, if there was a backlog and Retweets were being ignored then repop got called when trying to display a queued Retweet:

/repop {
  speakpop
}

speakpop is what the 10 second timer called to display a Tweet, but I don't think I ever found a way for speakpop to be called within speakpop. All that repop did is move on to the next queued Tweet without requiring a wait for the timer to next fire.

Back in the day I was an IRC channel operator on both a Windows tech support and mIRC scripting help channel, so there was a time when I could probably have come up with a much more elegant solution for the queue, possibly involving a single tokenised list for the queue with each value a hash table key.

As it is, I never got around to fixing the out of sync token list issue and opted for unsetting the variables when such desyncs occurred—I didn't even bother turning that action into a right-click popup command.

Speech Timing

Synthesised speech is tricky. If I wanted to I could control speech timings more than I currently am by using the SSML to speech function instead of the text to speech function. Unlike AIML, however, I have never really dabbled in SSML.

Speech rate is a simple double value. Changing the timings of pauses for commas, full stops and between words would involve much more parsing of the Tweet text, but that amount of parsing would be rather wasteful because it still wouldn't be realistic.

Speech timing is an issue, but one I am not going to tackle… yet.

Speaking Robots

The technology for speech synthesis has come quite some way since the days of Robbie (and Peedy, Merlin, et al). Robbie isn't the only robot around these days, they have other names like Alexa, Siri, Cortana, and go-playing-fake-human (AlphaGo).

Speech synthesis is still mostly locked up in the cloud, whereas speech recognition has started slowly moving back to our devices. For speech recognition to be done on our devices at a somewhat low expectation level of reliability and accuracy without draining our batteries, our devices need some /gplay Merlin DoMagic2.

Buzz word time: ML (machine learning), DL (deep learning), neural networks, TensorFlow, AI, inference platforms, local inference, tensor cores, NNAPI, AI accelerators.

Context Detection

AI accelerators are underutilised (if utilised at all) chips (or bits of larger chips) that are designed to enable hardware-accelerated artificial intelligence stuff. A tensor core is an example of such an accelerator, and NVIDIA RTX Voice is an example of some software utilising the hardware.

I haven't yet looked into artificial intelligence or machine learning as I haven't had a use case where I have thought "AI/ML is the best option", but when it comes to turning text into realistic speech I am considering the options.

Speech synthesis can be done numerous ways, but one of the things it needs to know is how to parse the text. AIML does this based on context (with a huge amount of rules in order to infer the context) and SSML has, among other things <phoneme /> for telling TTS how a word should be pronounced using a phoneme alphabet such as the IPA.

One such example from my mIRC code is replacing "#" with "hash-tag-" and "hash-tag-stayhomesavelives" with "hash-tag-stay-home-save-lithes" because "lives" is pronounced by Windows as in "he lives on the moon" rather than "it saved countless lives".

I could use the SSML method and tweak the pronunciations, but that will likely affect the timing and flow of the speech. I would still need to work out which pronunciation a specific instance of a word should be using based on context, but I have no idea how I'd do that. My C# code for pronunciation swapping currently just iterates through a Dictionary<string, string> replacing case-insensitive keys with values, such as dictionary entry { "govuk", "guv-UK"}.

Detecting context is one of those things ML training is supposed to be good for, but I'm not sure where to start. TheJCBot (an IRC bot I recently found a backup of the source code for) was a test Visual Basic solution/project in which I attempted to create an IRC bot using AIML.

I didn't get very far with the AIML rules for TheJCBot, but we are currently in the time period where there are WhatsApp and Twitter bots as well as text-based "help assistants" providing context-based help and advice. Some methods are still like pre-AIML IRC bots, such as a Twitch bot's !uptime command, but some are attempting input and conversation context matching.

As things currently stand in local uncommited code, Stream Controller is creating the same TTS output as mIRC, although I still need to work out how to tell it to use a particular audio interface for output (potentially related to issue #19).

Realistic Synthesised Speech

For speech to not sound synthetic, something more than context is needed. This is something on my "to dabble with later" list and I've only had a glimpse at one potential way of dealing with this complex task.

This post, intended for developers with professional level understanding of deep learning, will help you produce a production-ready, AI, text-to-speech model. Generate Natural Sounding Speech from Text in Real-Time, NVIDIA Developer Blog

watfordjc commented 3 years ago

Look into Creating an OBS Plugin

A couple of issues at the moment are caused by OBS not having a way to tell a slideshow to move to the next slide without using a hotkey, and the way slideshows operate (loading all images into memory until a memory limit is reached).

There is also a latency issue on updating the clock/weather if a Tweet image is being replaced at the same second. This is potentially made worse by the Tweet processing timers being added before the clock/weather updating timers.

As I have started doing things that aren't (currently) done involving the network, but rather the local file system, closer integration with OBS would make sense. Taking Tweet images as an example, saving images to the file system seems wasteful—I even considered a RAM disk to save my NVME SSD some writes.

If I'm going to keep the image creation library inside this application, the ideal solution would be using some sort of shared memory between OBS and Stream Controller.

Before doing that though, I need to actually work out how to get OBS to display a Direct2D image, and to do that I'm going to try writing a plugin.

Although there is a way of writing plugins without actually building obs from source, hitting F12 is faster than searching GitHub.

Building OBS from Source

All of the instructions in this section, unless stated otherwise, are being performed below the same relative directory.

Collapsible build instructions. ### obs-studio Follow the instructions for [building obs-studio](https://github.com/obsproject/obs-studio/wiki/Install-Instructions#windows-build-directions). - [x] Clone the obs-studio.git repository with submodules into ```obs-studio```. - [x] Download ```dependencies2017.zip``` and extract to ```obs-deps```. - [x] Download ```Qt_5.10.1.7z``` and extract using 7-Zip to ```obs-deps-qt```. - [x] ```cd obs-studio```: - [x] Create ```obs-studio.sln``` using cmake-gui. - [x] Build and run obs-studio. ### Chrome Embedded Framework (CEF) Follow the instructions for [building obs-browser](https://github.com/obsproject/obs-browser/tree/c8ff6ee01365b5d21098b26a882874aca348533a#building-obs-and-obs-browser-1). - [x] Download and install LLVM ~~8.0.1~~ **10.0.0**. - [x] Download the latest package of CEF that is of the same branch specified by obs-browser instructions (at the time of writing, **3770**). - [x] Open the .bz2 archive in 7-Zip, then extract the .tar, and extract the folder to the directory containing the obs-studio sub-directory. - [x] ```cd cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64``` (or whatever version you have). - [x] Create ```cef.sln``` using cmake-gui. - [x] Build and run cef (**ALL_BUILD**) ### obs-browser Follow the instructions for [building obs-browser](https://github.com/obsproject/obs-browser/tree/c8ff6ee01365b5d21098b26a882874aca348533a#building-obs-and-obs-browser-1). - [x] Recreate ```obs-studio.sln``` using cmake-gui, enabling the OBS Browser build option, pointing the CEF paths at those created when building **CEF**: > **CEFWRAPPER_LIBRARY** = C:/Users/John/source/repos/cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64/build/libcef_dll_wrapper/Debug/libcef_dll_wrapper.lib > **CEFWRAPPER_LIBRARY_DEBUG** = C:/Users/John/source/repos/cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64/build/libcef_dll_wrapper/Debug/libcef_dll_wrapper.pdb > **CEF_ROOT_DIR** = C:/Users/John/source/repos/cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64 > **CEF_INCLUDE_DIR** = C:/Users/John/source/repos/cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64/include > **CEF_LIBRARY:FILEPATH** = C:/Users/John/source/repos/cef_binary_75.1.14+gc81164e+chromium-75.0.3770.100_windows64/Debug/libcef.lib > **VIRTUALCAM_GUID** = UUID-AKA-GUID-SUCH-AS-FROM-VISUAL-STUDIO-TOOLS-MENU-WITHOUT-BRACES - [x] Try to build and run obs-studio with obs-browser plugin support. ### obs-websocket Follow the instructions for [building obs-websocket](https://github.com/Palakis/obs-websocket/blob/4.x-current/BUILDING.md#compiling-obs-websocket). - [x] Clone ```obs-websocket.git``` into ```obs-websocket```. - [x] cd ```obs-websocket``` - [x] Create ```obs-websocket.sln``` using cmake-gui. - [x] Build obs-websocket, and then rebuild (and run) obs-studio.

Shared Memory Image Source

There are several documents/files that will be needed for referencing:

Looking at obs-studio's source code, there are two plugins that I think are going to be relevant for my load-from-RAM image source: color-source and image-source.

It is my current understanding that OBS uses shaders (DirectX HLSL) to draw "textures", and shaders come in several types including:

Skeleton Plugin Code

OBS Studio have a GitHub project template for OBS plugins. A click on the Use this template button, a text box entry, and one more click, and watfordjc/obs-shm-image-source was created.

I then cloned that to the shared relative directory used above, created a build sub-directory, and then created a .sln file with cmake-gui, and then opened and built the solution.

I then amended the initial commit to change the author (and sign the commit with my GPG key), and then created my first commit (watfordjc/obs-shm-image-source@b0d33c2) which customised CMakeLists.txt, README.md, and .gitignore for the repository.

Collapsible section: Generating Solution File with cmake-gui #### Generating Solution File with cmake-gui cmake-gui might not like not being able to find OBS and the current version of ```FindLibObs.cmake``` might not offer a way to enter the variables after clicking Configure in cmake-gui. If that's the case, the following are the minimal variables needed after deleting the cmake cache (File -> Delete Cache) for it to start complaining about not being able to find things and not ignoring you: * ```LIBOBS_INCLUDE_DIR``` - a ```PATH``` to the ```libobs``` directory inside the ```obs-studio``` directory. * ```LIBOBS_LIB``` - a ```FILEPATH``` to the ```obs.lib``` file in the ```obs-studio/build{64}/libobs/Debug``` directory. With those two variables given to cmake-gui that has no cache in the build directory, it should then complain about ```LibObs_DIR``` not being set. Unlike the previous two variables, it actually notices you changing this one and it needs to be set to ```obs-studio/build{64}/libobs```. Keep clicking Configure and setting variables until it says ```Configuring done```. I needed to set the following variables: * ```Qt5_DIR``` - a ```PATH``` to ```obs-deps-qt/5.10.1/msvc2017_64/lib/cmake/Qt5```. * ```W32_PTHREADS_LIB``` - a ```FILEPATH``` to ```obs-studio/build64/deps/w32-pthreads/Debug/w32-pthreads.lib```. * ```OBS_FRONTEND_LIB``` - a ```FILEPATH``` to ```obs-studio/build64/UI/obs-frontend-api/Debug/obs-frontend-api.lib```. With ```Configuring done```, click Generate and it should say ```Generating done```. There should now be a ```.sln``` file inside the build folder that can be opened with Visual Studio 2019.

Licensing

Collapsible licensing notes Something further of note: obs-studio is licensed **GPL v2** so all plugin code and binaries will also be required to be **GPL v2**. For the avoidance of doubt, any of my resultant code that is not based on OBS's code will be dual-licensed: GPL v2 and **MIT**. For example, the code for accessing shared memory will likely be used by both the plugin and Stream Controller, but will be my own C code and probably derived from the Win32 documentation for [Creating Named Shared Memory](https://docs.microsoft.com/en-us/windows/win32/memory/creating-named-shared-memory).

Drawing Another Rectangle

In Direct2D, one of my first steps was drawing a rectangle because Microsoft's documentation literally pointed at Drawing a Simple Rectangle, which turned out to be a tad more involved than the documentation suggested.

In Direct2D we're working with "2-D geometry, bitmaps, and text". Not only is a rectangle a 2D shape, there are constructors for creating rectangles with either float or UINT widths and heights, as there are for ellipses.

HLSL uses shaders, and I don't yet know what abstractions are available. Do rectangles exist or do I have to build one out of triangles? Do triangles exist or do I have to build them out of points in space? How many axes do I need to think in as I have a bit of trouble when z is involved.

In commit watfordjc/obs-shm-image-source@abfbc14 I created a skeleton video source, and in commit watfordjc/obs-shm-image-source@fb43433 I changed the source listing name (localisation variable wasn't set to a value) and icon (to an image). After rebuilding I ran the obs-studio solution and could add and remove a Share Memory Image source. It can't do anything yet, so let's have it draw a 512x512 square…

After looking at a lot of OBS Studio source code, I think I have found the most suitable option:

gs_texture_t *gs_texture_open_shared(uint32_t handle)

It looks like handle is an IDXGIResource which is not an OBS type, it is a Win32 type.

An IDXGIResource interface allows resource sharing and identifies the memory that a resource resides in. —IDXGIResource interface, Windows Dev Center

The alternatives (other than loading from file) don't sound that pretty, with one option being to load every pixel into a one-dimensional array. If I can work out how to use this option I should be able to avoid saving to image file and potentially avoid OBS Studio loading the image data into the GPU (assuming OBS doesn't need to convert a DXGI image).

I've got a lot more Direct2D/DirectX reading to do, starting with modifying csharp-message-to-image-library and Stream Controller so that, instead of rendering on no specific device, I render on the same GPU that OBS is using. From the look of things, I need to follow How to render by using a Direct2D device context and I think the first step will be changing CreateD2D1Factory() so that it creates an ID2D1Factory1* instead of an ID2D1Factory*.

Progress

The following commits in other repositories are related to this issue:

Rewriting Tweet Display Code and Queue

By switching to using a shared texture in uncommited code for Stream Controller, the display queue is now broken as Tweets are displayed as soon as they are processed (potentially at 60 Tweets per second). I therefore need to redesign how Tweets are being processed and queued for display.

To start with, I'll just work on getting the display queue working again. That will likely involve adding more properties to the QueuedDisplayMessage class and modifying ParseMessageFromIrcLog() so it doesn't call the image generation method which will need moving to the ShowNextTweet() method. I will also need to modify the ClearTweet() method as I no longer generate a blank (headings only) PNG.

After that, I can start looking at DXGI buffers again and both the IDXGISwapChain1::Present1() and ID3D11DeviceContext::CopySubresourceRegion() functions to see if I can make some optimisations.

DXGI Swap Chain

The swap chain basically allows swapping between buffers. buffer[0] can be both written to and read from, whereas buffer[1..n] can only be read from.

Creating the swap chain using DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL and 2 buffers means the two buffers can be swapped between and, upon swapping buffers, the contents are retained.

Calling Present1() rather than Present() adds the const DXGI_PRESENT_PARAMETERS *pPresentParameters parameter. DXGI_PRESENT_PARAMETERS has options for dirty rectangles and a scrolling rectangle. I don't think I'll be utilising the scrolling rectangle parameter for now as the content of the canvas/surface isn't really scrolled.

Dirty Rectangles

Dirty rectangles are a bit like ID2D1RenderTarget::Clear() calls to wipe the Tweet area to the background colour before starting drawing the next Tweet. They are the areas of the back buffer that have changed since Present1() was last called.

Let's say I pass in the VerticalMessagePanel.messageBlankingArea rectangle as a parameter to Present1(). Anything outside the VerticalMessagePanel.messageBlankingArea rectangle could be assumed to have not changed. I'm not sure what optimisations the operating system does as a consequence, and there is this note on the pDirtyRects parameter:

An application must not update any pixel outside of the dirty rectangles.

It isn't that much of an optimisation because of how large the blanking area is compared to the canvas, and I'm only using that variable because MessagePanel.MessageRectangle doesn't contain all the pixels of some displayed Tweets (some letters, like j extend outside the rectangle).

If I can work out how to get the rectangles to contain all of the pixels from drawn text (it is probably a DirectWrite formatting option), then a further tweak to my queue could be combined with blanking rectangles and sub-resource region copying as a fairly big optimisation.

Retweet Tweak

At the moment, unlike the previous mIRC queue system, the Tweet queue does not ignore Retweets if there is a backlog. The TTS also reads out the text from the Tweet that got Retweeted, potentially causing the same Tweet to be read out a dozen times in a row.

If I were to go by Tweet ID so that a Tweet only appears in the queue once, and turned the Retweeter properties into a class, a message could contain an array of Retweeters. If there is a queue backlog, by the time an original Tweet gets processed it might already have Retweets.

If the IsTweet property is true, then the Tweet time can continue to be displayed and the only thing that needs changing is the Retweeter display name and username. The Tweet doesn't need sending through TTS again, and each Retweeter only needs to be displayed for n (2? 5?) seconds rather than 10+ seconds.

If the IsTweet property is false and it is around midnight, the Retweet time may also need updating.

If the TimeRectangle, SharerDisplayNameRectangle, and SharerUsernameRectangle are correctly sized, they could be dirty rectangles and sub-resource regions. As the Retweeter is displayed at the bottom of the Tweet panel, the Twitter/Retweet images won't need drawing again.

Speaking of the Twitter and Retweet images, it would probably make sense loading them into the GPU rather than loading them from disk for every Tweet.

ID3D11DeviceContext::CopySubresourceRegion()

The ID3D11DeviceContext::CopySubresourceRegion() function would replace the CopyResource() call. Rather than copying all the pixels from the buffer to the shared texture in UpdateImage(), only pixels inside of the dirty rectangles need to be copied.

To start with I'll go with VerticalMessagePanel.messageBlankingArea. I'll also need to remember to handle the original drawing of the headings.

watfordjc commented 3 years ago

Release Critical Bugs

There are currently some bugs in my code that I consider significant enough they are blocking any merging of changes into the master branch.

GPU Memory Leaks

There is currently a memory leak in watfordjc/csharp-message-to-image-library—when Stream Controller is closed, all DirectX/Direct3D/Direct2D resources are not released.

This is only a small issue given the resources use less than 10 mebibytes of GPU memory, my GPU has 8 gibibytes of RAM, and my GPU is in an eGPU enclosure that I can just unplug and replug to wipe the memory, but a memory leak is still a memory leak and shouldn't exist if I am programming correctly.

I think I am going to have to take a break from this project/repository and delve deeper into C++, COM, and DirectX in order to fix the memory leak and whatever else is wrong with my code. The next milestone for this repository has been pushed back to an indeterminate date.

OBS Studio and obs-websocket Updates

The next major version of OBS Studio, 26.0, is nearing release. 26.0 RC 3 was released 4 days ago.

The next version of obs-websocket, version 4.9, is also being developed. obs-websocket 5.0 will contain breaking changes as some things are going to be renamed and some that are being deprecated in 4.9 will be removed.

As well as adding support for the new sources and events, I need to go through my code to make sure unknown source/event types don't cause crashes.

Text To Speech (TTS) Echo Bug

I still haven't solved a TTS bug that results in occasional duplicate audio with a noticeable (multi-second) delay. There is something wrong with the queue system that I haven't pinpointed yet.

Based on volume difference on the last occurrence, I think the problem may be with some mIRC timers still running. I have unloaded the remote and alias files from mIRC to see if the previous fix worked and this bug no longer exists.

Slideshow Next Slide Hotkey

Automated switching of slides sometimes stops working until both OBS Studio and Stream Controller are restarted. I am waiting to see if obs-websocket 4.9 and OBS Studio 26.0 add "media key" support for slideshows so that an obs-websocket request can replace the use of a hotkey.

At the time of writing, obs-websocket documentation only lists VLC sources as supporting the NextMedia request type. The obs-project pull request for adding media control support to slideshows has a comment from the obs-websocket contributor that added media control support to obs-websocket that obs-websocket will need "a small code change".

I also need to decide whether or not I am going to continue using slideshows or if I'm going to switch to a shared memory image source.

Portable Version

Issue #28 still needs looking at as settings are still not portable. This issue is currently the only one marked for the next Stream Controller milestone.

A cursory Googling suggests I will need to implement my own settings provider in order to make settings portable.

Shared Texture Handle

There is currently no way, other than by looking at debug output, to determine what a shared texture handle. Copying and pasting from debug output to the OBS source configuration is currently required.

This should be done using IPC, probably using named shared memory. A global name with an acceptable format of a GUID is probably the way I'll name it.

Configuration Options

Absolutely nothing related to this issue has UI configuration options. All Tweet formatting is currently hard-coded.


Progress

The headings above have been turned into a task list:

These tasks need completing before further work on this issue can progress.


GPU Memory Leaks

Fixing this issue is probably going to require a rewrite of my Direct2DWrapper project and MessageToImageLibrary.

Although things currently (mostly) work as intended, my use of DirectX and other APIs is likely to increase as I consider doing more things in the GPU.

I know what a memory leak is, the difficult part is finding a memory leak when you aren't familiar with how something is allocating memory. A month ago I don't think I had ever written any code in C++, and my current C++ code is closer to C code due to the unfamiliarity—lots of pointers, zero classes.

As with most things in this repository, the only reason the code has been written is because I wanted to do something. There are usually several ways of doing something, and most of my bugs are likely due to me doing something the "wrong" way. Fortunately, the first thing I want to learn is something people have been doing since Windows 95 OSR2: writing a simple video game using DirectX in C++.

I am currently experiencing symptoms of carpal tunnel syndrome so progress is going to be even slower.

Learn How to Write C++ DirectX Code Correctly

There is only so far I can go with Microsoft documentation so, to start with, I am going to gradually work my way through a free C++ for Programmers course.

This course focuses on 'how' as opposed to 'what'. For example, in the lesson on functions, we do not teach what a function is, but rather how to create a function in C++.

C++ is object-oriented whereas C is procedural. As a result, I expect the course will draw more upon my Java knowledge than my C knowledge. std::string varName looks more like String varName than char* varName, for example. As well as working with objects and different libraries (e.g. iostream instead of stdio.h) other differences to C are probably going to be in the areas of pointers, type safety, passing by reference, and possibly callback functions.

Once I have a grasp of how C++ differs from C, I can start looking at DirectX. As someone that tries to write ANSI C code in certain projects, it does look like "pure" C++ code isn't usually used for DirectX. Things I have already tried avoiding are ATL (Active Template Library), C++/CX, WRL, and C++/WinRT—doing so is a potentially unnecessary hurdle, though.

Using C++/WinRT (C++ header file implementation of C++/CX which superseded WRL) without ATL might be possible, creating something resembling "pure" C++. As with the rest of this application, I am not currently considering supporting anything other than Windows 10, so there will be no need to support OpenGL as well as DirectX (unless I want to do something that is easier using OpenGL).


Operating System and Hardware Requirements

When it comes to DirectX, I am going to have to consider "minimum" and "recommended" requirements much like video game developers.

Running dxdiag, I can state what those are going to be at the maximum end of the scale based on my laptop's external GPU.

Recommended Hardware and Software