Monitor .srt files for new subtitles during live playback.

enternoescape commented 8 years ago

None of the free video decoders being used in SageTV appear to support closed captioning for H.264 content. The ability to append to the subtitles live would help make this less of an issue. I noticed that during live recordings, if a .srt file was created along side the recording, SageTV would not pick up any new subtitles written to the file. I tried skipping back and forward, but the new subtitles never appeared. They only appeared after I restarted playback.

Also related, but may be a separate issue: If the recording is split into multiple files I noticed that on playback at least in the placeshifter, the subtitles for the first file will be used for all of the files in the recording even if every file in the recording has it's own .srt file.

Narflex commented 8 years ago

The subtitle handles currently load everything when the player loads and then never look back to see if there's more information available. What's probably not that hard to do is in something like sage.media.sub.SRTSubtitleHandler.loadSubtitlesFromFiles(MediaFile) you detect if that MediaFile is currently recording or not. If it is, then don't close the file after you finish reading what's in it, and spin up a thread to regularly look at that file for more data and then add the additional subtitle entries (and then terminate the thread and close the file in the cleanup() method). However, this gets to be more problematic in the context of SageTVClient because then you're not dealing with a local file....but in that case, you'd then need to keep checking with the SageTV Server if there's more content in it and then downloading and parsing it as that happens (although I'd just write it to a temp file like it does already via getLoadableSubtitleFile and then just keep appending that so all the other logic is universal).

The multiple segment recordings is a separate issue. It's currently designed so that the subtitle information is part of the media format; and inside of that it contains the path to the subtitle file. Formats apply to the recordings as a whole and are not for each individual segment; so that's where the disconnect lies. In MediaFile.checkForSubtitles() that's where you can see it uses the first file as the base filename to check for the external subtitle files. A less than ideal workaround to this problem is to pass a segment number into that function (and change the various things that call into it) and that would work fine; except for the fact that it would then update the file's format to be that way...the only real negative impact I can think of to that is then if multiple clients try to play this file at nearly the exact same time, they would then be possibly reading the other subtitle location (but this is probably better than what currently happens in this case). But this is all a moot point if the first thing above is done. :)

CraziFuzzy commented 8 years ago

Is there a real reason to keep subtitle info IN the media format object? Seems to make more sense to load it during playback, in essentially the same way the comskip playback plugin reads in .EDL files (including the way it treats each segment separately).

Narflex commented 8 years ago

It does actually redetect them before playback every time. There's also a special metadata attribute called VARIED_FORMAT which also causes the MiniPlayer to redetect the format for each individual segment in a multi file recording as well (to fix an issue I had with PIDs changing mid program on FIOS). These can likely be used to make this easier.

Jeff Kardatzke Sent from my Android On Jun 23, 2016 7:44 AM, "Christopher Piper" notifications@github.com wrote:

Is there a real reason to keep subtitle info IN the media format object? Seems to make more sense to load it during playback, in essentially the same way the comskip playback plugin reads in .EDL files (including the way it treats each segment separately).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-228072621, or mute the thread https://github.com/notifications/unsubscribe/ANEIDED8NjtYej4at3iwlV85OED4CSXlks5qOpu5gaJpZM4HwFxZ .

enternoescape commented 8 years ago

The re-detection is was I observed. I was just starting to take a stab at this tonight. I feel like I'm missing something obvious, but what's the best way to determine the MediaFile is currently a live recording?

Narflex commented 8 years ago

I believe there is a isRecording() call in it. :-)

Jeff Kardatzke Sent from my Android On Jun 24, 2016 7:15 PM, "Joseph Shuttlesworth" notifications@github.com wrote:

The re-detection is was I observed. I was just starting to take a stab at this tonight. I feel like I'm missing something obvious, but what's the best way to determine the MediaFile is currently a live recording?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-228505869, or mute the thread https://github.com/notifications/unsubscribe/ANEIDG84lKkqBEyzOuHY2WJfN89oqVsLks5qPJ1CgaJpZM4HwFxZ .

enternoescape commented 8 years ago

Yeah, that was too obvious. :) I interpreted that as just saying it's a recording, not that it was actively recording.

Yeah, that looks like it will be true for anything that matches MEDIA_MASK_TV.

enternoescape commented 8 years ago

I don't think that will have the desired effect. I don't want to poll a file for changes if the corresponding recording it's not actively being written. I think isAnyLiveStream() is closer to what I'm looking for.

CraziFuzzy commented 8 years ago

IsFileCurrentlyRecording(sage.MediaFile MediaFile) is probably more appropriate, but I think it's far better to check the actual .SRT file(s) on an interval instead (like the way .EDL files are monitored), whether the file is actively being recorded or not.

enternoescape commented 8 years ago

I guess the SRT file could be created after the recording has completed while you're watching it. At least in the case of CCExtractor, it's so fast compared to Comskip this would be a rare occurrence. SRT files are timing sensitive and that could result in somewhat aggressive polling of the file. This can create a surprising amount of stress on the file system as I learned with OpenDCT. My thoughts right now are instead of polling the file at intervals, the file is only polled when we have reached the last known subtitle to see if we can load any more and if we can't then it resorts to timed polling.

It's important to know that if you start playback and the SRT file doesn't already exist, this monitoring will not even happen. That's partly why I don't think it's necessary to do this the same way Comskip does it.

enternoescape commented 8 years ago

I looked up IsFileCurrentlyRecording because that sounded familiar. It is a part of the API which I don't really need to abide by when writing core features. It basically just find the MediaFile in question and checks that the file is in the database and isRecording() which is basically what was suggested earlier. Knowing that, and knowing that just matching the mask MEDIA_MASK_TV is enough to make that return true, now I find a little confusing that that does in fact return the correct value when a show is not recording. I would have assumed that the mask doesn't change just because the recording stopped, but I guess that just means I don't understand the meaning of the mask.

stuckless commented 8 years ago

@enternoescape - not 100% sure what you are after... but... isFileCurrentlyRecording will only return true if the file is being recorded, now. It won't return true, tomorrow, for example.

But if you are looking to figure out "when" a new recording is happening... you can use the event API system, and RecordingSegmentAdded, for example. https://github.com/google/sagetv/blob/364cd4b9998dc49ea974dc94562c1be844603d6d/java/sage/SageTVEventListener.java#L29

There are a number of events in there that cover things like recording started, completed, etc, or even when a mediafile is being watched, paused, stopped, etc.

CraziFuzzy commented 8 years ago

I think he was not looking to see if the file was a recording, but whether it was actively recording, because he's working off the assumption that .SRT files are only going to change or appear while the file is still recording.

enternoescape commented 8 years ago

@stuckless Thanks for the pointers.

I'll think I'll just write it so that if the subtitle file exists when you start and the file is any kind of recording, active or not, it will monitor for changes. The reason I was so focused on live only is because the experience is kind of bad if you actually need the subtitles to be able to watch the show, that means the SRT file needs to be written at the same time as the recording.

While this should be good enough for many people, the real solution will be integrating a closed captions extractor that will make the captions accessible just ahead of the actual video playback.

CraziFuzzy commented 8 years ago

I mean, an ultimate solution would be to go with something more robust that .SRT anyway. SRT doesn't even include all the info that CC contains. CC has positioning information that is lost in the SRT format. Ultimately, we need STV logic that will read in a much more robust format (like .ASS) and render that out to the screen as STV widgets. That's a whole lot bigger project than just changing out already supported formats are read in though.

enternoescape commented 8 years ago

That's close to what I'm talking about, but if the core is doing all of the decoding, we don't really even need that information to be stored in a file. It could be decoded from the A/V container as it's needed. We'd just create a new SubtitleHandler.

enternoescape commented 8 years ago

@Narflex I often see the use of the Vector object in the source code. I can understand why some of them are probably there, but at least in the case of SubtitleHandler, we are already using synchronize and then we use a Vector which is also internally using synchronize. I think in this case, an ArrayList is a better choice. I would usually just leave something like this alone, but since the client will possibly need to keep downloading the file and parsing it, depending on the number of subtitles, it could actually become a performance issue.

Also, if it's all the same to you, when it's relevant to what I'm working on I would like to actually provide the values for the type variables in Map objects. It just looks cleaner in the IDE and will reduce human errors.

enternoescape commented 8 years ago

I think I found a much nicer solution to the Windows Client not having direct access to the file. I wrote a new class called MediaServerInputStream that basically enables you to open a remote file as an InputStream using the MediaServer with a little local buffering to keep small data requests from being a problem. The advantage is now I don't need to copy the file locally, then read it. I'm sure this might help improve overall efficiency in other similar situations too.

CraziFuzzy commented 8 years ago

How big are the .SRT files in general? Are they too big to be brought in with GetFileAsString (which is server based)? (Not that I'm against an InputStream method being implemented, but I like to keep UI Related things based through the API, and done IN the UI).

CraziFuzzy commented 8 years ago

Though, thinking about it, the absolute BEST thing might be to roll up a Subtitles class, and have that read in .SRT, .SSA, .ASS, etc on the server, and serve up an actual object and methods for the list of caption events (keyed by timestamp). Caching and new file checking could be done on the server, so on first request for the events, if it's not parsed in and loaded, the server does that, and checks file timestamp on each access to see if new information is there to be brought in. This way, clients are dealing with files at all, just function calls.

enternoescape commented 8 years ago

GetFileAsString still pulls the entire file over the network. Unlike Comskip files, these files could add measurable overhead if we are constantly transferring the entire file on a live stream. Unlike Comskip file, SRT files don't change anything that's already been written. Also the entire file needs to be parsed every time if you do it that way or at the very least you need to get back to where you left off; either way it's very inefficient in a way I'm sure someone will notice during playback. When it is read as an InputStream, I can read the file until it says there's nothing else to read, then come back later to see if there's more to read. Also this can be reused for many things that would go a little faster if we parsed the data as it comes in instead of doing a complete download, then read.

Subtitles to me are one of those special cases since they are useless if they aren't on time. I'm sure that Jeff can fill in the blanks for you there on why it's done the way it currently is done. I'm not against doing a complete re-write if it will take weeks, but I'm not really interesting a month+ project that has no visible benefit. If there's a reason other than convenience as to why the subtitles are processed by the Client instead of the server, I'm probably going to leave that alone. I suspect it has something to do with timing and the local client is definitely going to be faster than the server trying to get things out on time.

It's actually very atypical for a subtitle file to be growing as you're using it. I don't know of any programs that currently support this. It's probably not worth going much further than making this work well with what is available. I feel like closed captioning being extracted by the server might merit completely renovating this system, but given my limited time and the other things I want to do that will have a much more noteworthy impact on the SageTV experience, I'm likely going to get this to the point that it works very well and leave the kind of changes you're suggesting up for a later date.

enternoescape commented 8 years ago

@Narflex From what I can tell, if we go past the last subtitle, the value returned basically tells VideoFrame to come back in a week. I was trying to follow the logic there, but I can't with complete certainty tell what it does with this value. Will it actually not check again for that amount of time, if so, I tell it the next subtitle is in 2 seconds when a new subtitle has yet to be available or would that cause a problem?

enternoescape commented 8 years ago

It turns out 1 second polling is better than on demand. I underestimated how much time it can take to parse new entries. The good news is I'm right at live and all of the subtitles are loading just in time every time, so that's great.

enternoescape commented 8 years ago

The new MediaServerInputStream class I created is working terrifically too. I was a little concerned that it would have all kinds of little quirks, but it's working as well as direct access. I modified IOUtils.openReaderDetectCharset() to accept the boolean parameter, local, that tells it if it should return a BufferedReader backed by a FileInputStream or a MediaServerInputStream. This allows things to be effectively seamless between a local and remote file.

Some of the other performance issues I found was that the original code was creating a new StringBuffer for every single subtitle. I replaced that with a local StringBuilder object that we clear for each new subtitle entry. I also added some handling for when we can't be completely sure we got the entire subtitle, so it will go back and try one more time on the next poll.

I think figured out the VideoFrame coming back in a week thing. From observing the logs, it looks like it actually will not come back for a week. When I'm doing this live, I now have it intercepting that value when it's a week and changing it to 1 second. Situations that could lead to running out of subtitles usual are commercial breaks since not all commercials have captions.

@Narflex Now I'm getting down to the harder bits. I can see what what you were talking about just changing the format every time the segment changes, but that really does feel a little hacky. If I understand, the race here is if two miniclients load different segments at the same time?

enternoescape commented 8 years ago

@stuckless I did find something interesting when using the Android miniclient with live subtitles. Occasionally they clobber each other and often they aren't available in time. Reloading the video doesn't fix the clobbering. I can't recall if I've seen this before, but as you might guess it only happens when there's a lot being said. The Windows Client, Placeshifter, HD200 and HD300 don't have this issue. The Android miniclient appears to run a little closer to the edge of the stream, so that might be part of the problem.

I know there has been talk about having the server process the closed captions, but if the Android miniclient is running this far ahead, I think we'll need to find a way to hold it back a second or two. OpenDCT actually extracts the closed captions just before the video stream hits the disk, so it's falling behind even with this advantage. I'm working on further optimizing the loading performance, but I think I've reached a point where things are still improving, but the gains are getting smaller.

stuckless commented 8 years ago

@enternoescape I'm not sure I fully understand the issue, but I'm open to suggestions on how to fix it :) Does this only happen for live tv streaming, or does it happen anytime there is a lot of subtitles, even for pre-recorded content? I know the HD300 processes it's own subtitles and sends them to sagetv for rendering... in the case of live tv, are you sure it's using the srt file vs processing its own subtitles?

enternoescape commented 8 years ago

@Narflex I feel a little silly now. I was looking at using the InputStream on all of the subtitle loading methods and I noticed that VobSubSubtitleHandler uses a RandomAccessFile which led me to notice the NetworkRandomFile class that does a lot of what I just did to create the InputStream.

I noticed that NetworkRandomFile doesn't allow writing. Is the reason just that it's unsafe or security purposes? From what I can tell there's nothing preventing you from opening a file and using WRITE instead of READ. In my InputStream, I added a method to create an OutputStream that determines the correct remote offset taking into account the offset in the local cache before writing and then effectively clears the local cache when you perform any writing to ensure that a future read will result in the data that's actually on the disk, not what's in the local cache. I also added a truncate command to the media server so we have a full complement of what FastRandomFile is allowed to do.

I noticed NetworkRandomFile as I was writing an interface to abstract the methods that FastRandomFile actually uses in RandomAccessFile, so that I could transparently swap out RandomAccessFile for a class that will open files using the media server. Since I pretty much finished creating my own implementations before I noticed your work, this is what happens when you go on vacation... :) I'm going to do some benchmarks just to see if there's any advantage between what I wrote vs what you wrote.

enternoescape commented 8 years ago

@stuckless In all cases, I'm talking about SRT files. After more thorough analysis, the disruption does appear to be caused the the thread locking needed to add new subtitles. I'm working on making the locking as brief as possible, but it does appear to sometimes cause timing issues in the miniclient. I'm guessing the other clients cache the subtitles a little. The same subtitles don't always clobber each other. I also saw the timebar flicker at the same time as one of the clobber "events," so I guess the locking is gumming up the works so to speak. It didn't matter if it was live or not.

I don't know if you have already seen flickering before involving the timebar or other elements, but I think whatever is causing that is probably the biggest culprit. Either way I guess this wasn't really the right place to bring this up. I'll dig a little deeper and probably end up creating an issue on the Android miniclient project.

Actually I just tried this with the latest jar on Bintray just for reference and the flickering and clobbering still happens there, so I guess I'm just noticing an issue that already existed. Turning off subtitles stops the screen from timebar from sometime flickering in time with the subtitles. I think this can probably be lumped in with the flickering and animation issues some forum members have reported. It's hard not to appreciate how good the miniclient is; especially given how quickly it became usable to the general public.

enternoescape commented 8 years ago

I have things optimized to the point that I can load 89 subtitles in under 37ms on a average computer. On a slower computer, I was able to parse 371 in 973ms. That's fast enough to update the subtitles on demand like I wanted to do before. I was observing the average caption is on the screen between 1.5 and 3 seconds. So I'll set the polling to happen every 5 seconds or if we have run out of subtitles. That appears be fast enough to have no measurable impact on playback.

Narflex commented 8 years ago

@enternoescape Regarding Vector, feel free to change those to ArrayList in areas where they don't need synchronization. This was from habitual use of Vector in my earlier days. :) Also feel free to use typed collections; a lot of this code was written before those even existed, which is why it wasn't in there.

For reading files from the server, there's already NetworkChannelFile and NetworkRandomFile (the former supports use of NIO Buffer objects for reading). So you may just want to use those (they extends the FasterRandomFile class which implements DataInput and DataOutput; so they should be usable for just about any case....which after reading more comments, I now see you have found. :) The only reason I never added write support was because I never had a need for it before from those classes.

For changing the format in the MediaFile objects each time you load a different segment; your assumption is correct, you don't want to do that because it'll break other clients that might be playing a different segment at the same time.

Let me know if I've missed any outstanding questions.

enternoescape commented 8 years ago

I had a feeling these were just carried over from the older days of Java.

I don't know how much the JVM will optimize things for this situation, but the InputStream implementation I wrote is a lot lighter than DataInput partly because it doesn't implement all of those other methods that will never be used when reading lines of characters. I'd need to write a wrapper class over top of NetworkChannelFile or NetworkRandomFile, to make getting a BufferedReader from IOUtils.openReaderDetectCharset() a seamless experience.

Is there a technical reason why the subtitle files for the client in the current code are always copied locally, then read? Is there a performance penalty? For example, I see for VobSub, that you could have used NetworkRandomFile to read the sub files, but instead you copy the entire file locally and parse it there.

Narflex commented 8 years ago

I didnt think there would be a case for live subtitle files at the time I wrote it, so it just seemed easier to go the temp file route via a copy. It's likely more performant to do it the other way, unless you are going to be skipping backwards many times and reloading the same data. But either way, the difference would be negligible compared to anything else going on in the system.

Jeff Kardatzke Sent from my Android On Jul 2, 2016 5:34 PM, "Joseph Shuttlesworth" notifications@github.com wrote:

I had a feeling these were just carried over from the older days of Java.

I don't know how much the JVM will optimize things for this situation, but the InputStream implementation I wrote is a lot lighter than DataInput partly because it doesn't implement all of those other methods that will never be used when reading lines of characters. I'd need to write a wrapper class over top of NetworkChannelFile or NetworkRandomFile, to make getting a BufferedReader from IOUtils.openReaderDetectCharset() a seamless experience.

Is there a technical reason why the subtitle files for the client in the current code are always copied locally, then read? Is there a performance penalty? For example, I see for VobSub, that you could have used NetworkRandomFile to read the sub files, but instead you copy the entire file locally and parse it there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-230128781, or mute the thread https://github.com/notifications/unsubscribe/ANEIDBmvFt-M8HszIyPfym67xhVrN0-4ks5qRwN2gaJpZM4HwFxZ .

enternoescape commented 8 years ago

I was just make sure I wasn't repeating history. You'll know better than me if something was tried and the results were worse than expected. I agree that it should be faster to parse the stream as it comes in instead of copying it over, then parsing. I also agree that in the grand scheme, it's probably not the biggest overall impact performance-wise.

I would like to add the NetworkInputStream class I wrote into the "Network" classes since it's very purpose-built as a lighter version off the bigger classes that extend FasterRandomFile. The class I wrote does buffering similar to FasterRandomFile to keep really small reads from making things slow. The problem is that InputStream is abstract, so there's no clean way to pull these all together without being a little clunky, so I'd rather make it it's own class.

enternoescape commented 8 years ago

I was looking more at what I can get out of NetworkRandomFile which should suit what InputStream needs best. The DataInput interface is implemented in FasterRandomFile (not NetworkRandomFile), so the methods I actually need for an InputStream are not exposed at the right level and we don't actually appear to have any buffering benefits from FasterRandomFile for the same reasons. For example, I can get the method read(byte b[], int off, int len), but it won't be reading the network file, it will be reading the RandomAccessFile that doesn't get opened because the implementation of the method is in FasterRandomFile.

I could write the missing methods into NetworkRandomFile, but I feel like this is the wrong way to get where we're going and will only lead to further duplication of efforts between local and remote files down the road. I know this is a rather core part of how SageTV does client communications, but I think this merits a cleanup. What I want to do is try to standardize these things a little better so we are not duplicating so much at the higher levels and will nearly eliminate any mistakes you can easily make when you extend a class in this manner.

The only difference when opening a file should be if a file is accessed locally or via the media server. The RandomAccessFile in FastRandomFile should be changed to an interface that can be swapped out between the local and remote file communications protocols. I have deduced what's actually used in RandomAccessFile and it's actually a very small number of methods. It's not really a huge ordeal to move these into an interface; I tried it and it works well. I know there are special transcoding methods in NetworkRandomFile, but we don't need to eliminate this class in my proposal (though it's less necessary); we would just eliminate the methods involved in data transfers and have a way to communicate special "directions" like transcoding to the media server through the RandomAccessFile replacement.

I realize this looks like I'm fixing something that's not broken, but I really think this is one of those changes that will work out for the best in the long run.

Narflex commented 8 years ago

Go for it. :-)

Jeff Kardatzke Sent from my Android On Jul 3, 2016 6:26 PM, "Joseph Shuttlesworth" notifications@github.com wrote:

I was looking more at what I can get out of NetworkRandomFile which should suit what InputStream needs best. The DataInput interface is implemented in FasterRandomFile (not NetworkRandomFile), so the methods I actually need for an InputStream are not exposed at the right level and we don't actually appear to have any buffering benefits from FasterRandomFile for the same reasons. For example, I can get the method read(byte b[], int off, int len), but it won't be reading the network file, it will be reading the RandomAccessFile that doesn't get opened because the implementation of the method is in FasterRandomFile.

I could write the missing methods into NetworkRandomFile, but I feel like this is the wrong way to get where we're going and will only lead to further duplication of efforts between local and remote files down the road. I know this is a rather core part of how SageTV does client communications, but I think this merits a cleanup. What I want to do is try to standardize these things a little better so we are not duplicating so much at the higher levels and will nearly eliminate any mistakes you can easily make when you extend a class in this manner.

The only difference when opening a file should be if a file is accessed locally or via the media server. The RandomAccessFile in FastRandomFile should be changed to an interface that can be swapped out between the local and remote file communications protocols. I have deduced what's actually used in RandomAccessFile and it's actually a very small number of methods. It's not really a huge ordeal to move these into an interface; I tried it and it works well. I know there are special transcoding methods in NetworkRandomFile, but we don't need to eliminate this class in my proposal (though it's less necessary); we would just eliminate the methods involved in data transfers and have a way to communicate special "directions" like transcoding to the media server through the RandomAccessFile replacement.

I realize this looks like I'm fixing something that's not broken, but I really think this is one of those changes that will work out for the best in the long run.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-230188160, or mute the thread https://github.com/notifications/unsubscribe/ANEIDFAxtidI5F8KE6s8_ErqFzrqfxeHks5qSGFfgaJpZM4HwFxZ .

enternoescape commented 8 years ago

I guess I didn't realize how unrelated to this issue my latest suggestion would be. :)

The more I look at this the more I understand how complex the task actually is to unify the way local and remote files are read in SageTV. Part of the problem is Java. They added ways to use off heap memory and it causes a lot of people to duplicate their efforts because in cases of high throughput, you don't always know which one is going to be faster; especially if you need to do many small reads from a byte buffer (and even more so if it's a direct byte buffer). I saw that you wrote a read and write buffer for NIO which feels weird, but I think I understand why that was done.

So far, I wrote classes to handle opening local and remote files using IO and a smaller separate set designed similarly for NIO that implement FileChannel and the interface SageFileChannel. I also wrote "wrapper" classes to add whatever is needed above the actual "file." For example, you can add a buffering layer to the local and remote files similar to how InputStream can be wrapped into a BufferedInputStream. In addition I moved the encryption into it's own class that can be used in a similar manner. Every layer is optimized to do what it does well and it makes it trivial to add new functionality if needed. I also wrote classes that you can wrap these inside of to get more standard Java abstract classes/interfaces like InputStream, OutputStream, DataInput, DataOutput, FileChannel. I was careful with DataInput and DataOutput to keep the character/string related methods exactly how you had them in FastRandomFile. The idea is that if we don't always need anything special, it's wasteful to be loading an all encompassing monolithic class.

I have replaced all uses of FastRandomFile, FasterRandomFile, BlurayRandomFile, BlurayNetworkFile, NetworkRandomFile and NetworkChannelFile with the minimum that each one actually requires for a given situation and I really think it has driven down loading times a bit. I'm still testing, but this kind of thing is really just a matter of verifying that the math is correct and the outcome is what you expected. So far everything is working great.

The Wizard is working correctly as it reads the wiz.bin and re-writes it. Because the encrypted filter class is its own layer, when the file is determined to not be encrypted, it can cleanly remove the encryption layer without any need to reopen the file (I know had this been a great cause for concern, it could have been done with FastRandomFile too, but not as easily). Separating encryption (which from what I determined isn't even used in open source) also means we have one less branch in every read and write method. I put some temporary logging in so I could see the random write cache misses and by how much they missed it. I optimized the write caching for the Wizard so that it will dynamically do partial writes up to half of the write buffer size when it determines that the next random write would end up missing the cache. The performance is very good.

stuckless commented 8 years ago

Sounds like this is a good candidate to unit test :) Unit testing SageTV code can be a PITA because so many things end up calling into Sage.java and that in turns will attempt to load the native parts, etc. But these classes seem like something that might be able to be unit tested fairly easily.

On Tue, 12 Jul 2016 at 22:57 Joseph Shuttlesworth notifications@github.com wrote:

I guess I didn't realize how unrelated to this issue my latest suggestion would be. :)

The more I look at this the more I understand how complex the task actually is to unify the way local and remote files are read in SageTV. Part of the problem is Java. They added ways to use off heap memory and it causes a lot of people to duplicate their efforts because in cases of high throughput, you don't always know which one is going to be faster; especially if you need to do many small reads from a byte buffer (and even more so if it's a direct byte buffer). I saw that you wrote a read and write buffer for NIO which feels weird, but I think I understand why that was done.

So far, I wrote classes to handle opening local and remote files using IO and a smaller separate set designed similarly for NIO that implement FileChannel and the interface SageFileChannel. I also wrote "wrapper" classes to add whatever is needed above the actual "file." For example, you can add a buffering layer to the local and remote files similar to how InputStream can be wrapped into a BufferedInputStream. In addition I moved the encryption into it's own class that can be used in a similar manner. Every layer is optimized to do what it does well and it makes it trivial to add new functionality if needed. I also wrote classes that you can wrap these inside of to get more standard Java abstract classes/interfaces like InputStream, OutputStream, DataInput, DataOutput, FileChannel. I was careful with DataInput and DataOutput to keep the character/string related methods exactly how you had them in FastRandomFile. The idea is that if we don't always need anything special, it's wasteful to be loading an all encompassing monolithic class.

I have replaced all uses of FastRandomFile, FasterRandomFile, BlurayRandomFile, BlurayNetworkFile, NetworkRandomFile and NetworkChannelFile with the minimum that each one actually requires for a given situation and I really think it has driven down loading times a bit. I'm still testing, but this kind of thing is really just a matter of verifying that the math is correct and the outcome is what you expected. So far everything is working great.

The Wizard is working correctly as it reads the wiz.bin and re-writes it. Because the encrypted filter class is its own layer, when the file is determined to not be encrypted, it can cleanly remove the encryption layer without any need to reopen the file (I know had this been a great cause for concern, it could have been done with FastRandomFile too, but not as easily). Separating encryption (which from what I determined isn't even used in open source) also means we have one less branch in every read and write method. I put some temporary logging in so I could see the random write cache misses and by how much they missed it. I optimized the write caching for the Wizard so that it will dynamically do partial writes up to half of the write buffer size when it determines that the next random write would end up missing the cache. The performance is very good.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-232240891, or mute the thread https://github.com/notifications/unsubscribe/ABAfKSur4IpWMcI30sUYjZ9TvHR-scmrks5qVFQEgaJpZM4HwFxZ .

Sent from Android

enternoescape commented 8 years ago

I completely agree. I was actually thinking along those lines. I can't test the media server implementation with a unit test since that will cause all kinds of native code to load. Due to the design, I can however create a custom source to test that things are working correctly at all of the levels above. The encryption layer also needs native because the keys it loads are populated by native code, but I could change it so that's not a requirement.

enternoescape commented 8 years ago

I will mention that these classes are very inclusive, so it might take twice as long to write the tests as it took me to write the classes. I actually found a bug in FastRandomFile when writing a single byte that apparently doesn't happen often or it would have been fixed. I made the same mistake when I was writing my own write buffer, and that's why I know the logic is in the wrong order. The error would be an out of bounds exception.

stuckless commented 8 years ago

The test classes is just a suggestion on my part... I tend to like them, especially when I refactor code. (I spend last weekend completely rewriting the Phoenix filename scrapers and because of the test cases... I was able to quickly find and fix my issues).

I think if you can quickly write some test cases around it... then do it... but if it becomes something that is going to take much more time to do... then your call. I've been meaning to restructure the Sage core code to be less dependent on the native parts (which would help in unit testing many other areas)... but I didn't get far with that side project :(

On Wed, 13 Jul 2016 at 09:28 Joseph Shuttlesworth notifications@github.com wrote:

I will mention that these classes are very inclusive, so it might take twice as long to write the tests as it took me to write the classes. I actually found a bug in FastRandomFile when writing a single byte that apparently doesn't happen often or it would have been fixed. I made the same mistake when I was writing my own write buffer, and that's why I know the logic is in the wrong order. The error would be an out of bounds exception.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-232354492, or mute the thread https://github.com/notifications/unsubscribe/ABAfKYj8H-FPjpHGDjaYP92llRRZ6omBks5qVOfmgaJpZM4HwFxZ .

Sent from Android

enternoescape commented 8 years ago

I'm going use TestNG since it's more flexible than JUnit and it's what I'm using for the more important parts of OpenDCT. I'll add the appropriate dependencies to Gradle and make sure the tests are a part of building Sage.jar. Thoughts? Concerns?

I agree that SageTV should be more sectioned up so that there aren't so many strange dependencies in some of the classes and we can avoid loading native when it's not really needed in all cases.

stuckless commented 8 years ago

TestNG is fine by me... I've never used it... but I'm not doing anything in SageTV core at the moment, so use whatever you want... Eventually I'll probably pull in mockito as well, since I use that fair bit to unit test stuff that is harder to decouple or abstract. My only suggestion is to put the test code in a separate test src folder (normally mimicing the package space of the original source file(s))... and yeah, having gradle run the tests during a build is good thing.

On Wed, 13 Jul 2016 at 09:40 Joseph Shuttlesworth notifications@github.com wrote:

I'm going use TestNG since it's more flexible than JUnit and it's what I'm using for the more important parts of OpenDCT. I'll add the appropriate dependencies to Gradle and make sure the tests are a part of building Sage.jar. Thoughts? Concerns?

I agree that SageTV should be more sectioned up so that there aren't so many strange dependencies in some of the classes and we can avoid loading native when it's not really needed in all cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-232357977, or mute the thread https://github.com/notifications/unsubscribe/ABAfKXIAJo5C4imfD7HY87mNErO6bPlBks5qVOrXgaJpZM4HwFxZ .

Sent from Android

Narflex commented 8 years ago

@enternoescape One thing I wanted to correct you on is that the crypto key for the Wiz.bin file is not loaded from native code in the open source version. It's in the Sage.java class on line 968.

And the main reason there's all that NIO code in there is for performance (mainly for when running in standalone mode on the HD300; but it benefits PC class execution as well). If you use a FileChannel for reading and then a SocketChannel for your network connection; you can send between the two directly and the bytes may never even touch system memory and just get DMA'd across. Also, if you use direct byte buffers with that same type of setup; then you can avoid ever having to copy stuff up into Java heap memory and keep it all at the native layer (which is faster for when the native code is reading/writing into those buffers since it doesn't need to pin the java objects in memory and can just access the native arrays directly).

And everything else sounds good...and I'm very curious about the bug you found in FastRandomFile...I'm surprised a bug survived in that class since it's been there since the beginning.

enternoescape commented 8 years ago

In OpenDCT I use direct NIO buffers for a lot of the native code which as you said keeps it from ever coming into the JVM. That alone cut down significantly on garbage collection and it's faster. I was mostly just commenting on the NIO buffering which I believe what they expect you to do use a ByteBuffer. What you have is closer to the way RandomAccessFile works which is certainly less awkward for processing large amounts of data. I wouldn't mind revamping the MediaServerRemuxer to accept and output to a direct ByteBuffer, but it might be questionable over if it's worth it.

Never mind on that bug I spoke of, it was a bug for me because I did something different. The issue was if you did a write from an array that went exactly to the end of the write buffer byte array. Then if you do a single byte write, you would get an index out of bounds exception. After looking a second time, I don't think that would be a problem for your class. False alarm! :)

enternoescape commented 8 years ago

Also the crypt key location was odd. I assumed it was native because when I looked for how it got there, I couldn't find it. :) Thanks for the correction.

enternoescape commented 8 years ago

I put FasterRandomFile and FastRandomFile through the random read test I wrote and found what I think is a bug in FasterRandomFile that you might already be aware of. ensureBuffer() calls read() instead of readFully() which means that it can return as little as 0 bytes even if we are not at the end of the file. Guess what, it consistently threw an EOF exception even though it was positively not at the end of the file. I made a small change to use readFully() and the EOF exception went away. I know that could be optimized better, but the classes I'm writing will probably be replacing the need for them to still exist.

Narflex commented 8 years ago

That's very interesting. Historically, when reading from file sources, even though read says that it may not return all the requested bytes, it always does. readFully is only ever really needed when reading from network sources. However, for correctness based on the API, readFully should be used of course since it says that read may not return the requested number of bytes.

I'm very curious how exactly you were running a test that enabled this to occur, since in practice I've never seen a read call on a file that has the data available to read return less than the number of requested bytes.

I looked at the API again, and it's interesting. In RAF.read(byte[],int,int) it says this:

"Although RandomAccessFile is not a subclass of InputStream, this method behaves in exactly the same way as the InputStream.read(byte[], int, int) method of InputStream."

And then if you look at the method in InputStream, it says this:

" The default implementation of this method blocks until the requested amount of input data len has been read, end of file is detected, or an exception is thrown. Subclasses are encouraged to provide a more efficient implementation of this method."

Which is saying it will behave in the same way as readFully. At the Java layer though, RAF.readFully clearly will keep calling read() until it reads everything and does not assume that the read method gets it all (likely because an overriding class could change that behavior).

So now I'm very curious...I'd really like to see the test case you created that caused read() to behave differently than readFully() in RandomAccessFile. :)

On Wed, Jul 13, 2016 at 8:17 PM, Joseph Shuttlesworth < notifications@github.com> wrote:

I put FasterRandomFile and FastRandomFile through the random read test I wrote and found what I think is a bug in FasterRandomFile that you might already be aware of. ensureBuffer() calls read() instead of readFully() which means that it can return as little as 0 bytes even if we are not at the end of the file. Guess what, it consistently threw an EOF exception even though it was positively not at the end of the file. I made a small change to use readFully() and the EOF exception went away. I know that could be optimized better, but the classes I'm writing will probably be replacing the need for them to still exist.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/sagetv/issues/87#issuecomment-232546529, or mute the thread https://github.com/notifications/unsubscribe/ANEIDMAMYQc0GOiqX_90N9V3NHaqy8r0ks5qVaphgaJpZM4HwFxZ .

Jeffrey Kardatzke jkardatzke@google.com Google, Inc.

enternoescape commented 8 years ago

I spotted the real problem. You're throwing an EOFException and it's not actually in the contract for the method (read()) calling it. I usually take at least 4 hours before I post that I don't get something or that I found something interesting, but I feel like I'm doing a great job of reporting nothing. :)

enternoescape commented 8 years ago

The test does sequential reads while optionally skipping back or forward at random until it reaches the end of the file. The very last read is actually read() which should return -1, but instead threw an EOF exception. Due to the way things were written, I ended up determining incorrectly that the value returned was not -1 when in fact it should have been, but threw an exception instead. I know it sounds odd that you could confuse these two things, but the symptom was an index out of bounds exception which should only be happening if somehow the last read didn't return that it was the end of the file. EOF was not always thrown which led to the confusion.

enternoescape commented 8 years ago

I deliberately leave incorrect behavior unresolved in the test cases because if you account for them, the underlying issue might not make the test fail. Also while I've noticed that unit tests are not the most accurate benchmark, I was able to prove that the new classes load faster and read/write as fast or in some cases faster than the old ones. I was a little concerned that overall performance might not be the same.

enternoescape commented 8 years ago

@Narflex I was seeing messages starting with "Waiting to send message reply to client until it reaches quanta" for long periods of time when testing the client. I see that message comes from SageTVConnection.java. Is that related in some way to the file system? Any ideas on what I should be focusing on to address it or does it basically imply my server is slow?

google / sagetv

Monitor .srt files for new subtitles during live playback. #87