samuelmaddock / gm-mediaplayer

:tv: Garry's Mod media player which features synchronized streaming media services
MIT License
49 stars 30 forks source link

fix(youtube): replace API with web scraping #34

Closed veitikka closed 4 years ago

veitikka commented 4 years ago

Hello, I updated the GetMetadata function to get meta data info from the parsed HTML page instead. The new function "ParseYTMetaDataFromHTML" takes care of this. The video requests can be a bit slow, since it requests an almost 1MB web page, maybe there is a better way to do this, I also have no idea how stable of a fix this will be, if youtube changes the names of the variables in future, it will no longer work.

---
-- Function to parse video metadata straight from the html instead of using the API
--
function SERVICE:ParseYTMetaDataFromHTML( html, videoId )
    --Lua search patterns to find Title and Duration from the html mess
    local titlepat = "\\\"title\\\":\\\".-\\\""
    local durationpat = "\\\"lengthSeconds\\\":\\\".-\\\""

    --MetaData table to return when we're done
    local metadata = {}

    --Find Title & Duration from the html and parse them, then insert to the table
    for parseTitle in string.gmatch(html, titlepat) do
        metadata.title = string.sub(parseTitle, 13, -3)
        break
    end

    for parseDuration in string.gmatch(html, durationpat) do
        metadata.duration = tonumber(string.sub(parseDuration, 21, -3))
        break
    end

    --Thumbnail can simply be retrieved with an URL, no need for parsing
    parsedThumb = "https://i.ytimg.com/vi/"..videoId.."/maxresdefault.jpg"
    metadata.thumbnail = parsedThumb

    return metadata
end

fixes #33

veitikka commented 4 years ago

Done, I also noticed a problem, certain special characters in titles like quotation marks are displayed as html entities. I added another local function to convert some of the more common ones. Maybe those functions should be moved to their own file, but if you're fine with them being there I'll leave it like that.

veitikka commented 4 years ago

Alright, thanks again for the improvement ideas. The function still "succeeds" in most cases if title or duration are invalid, so I added handling for that too. From what I've tested it should be quite error-resistant now.

samuelmaddock commented 4 years ago

Where have you found the parsing to fail? Are there certain types of youtube videos which don't seem to work?

veitikka commented 4 years ago

Where have you found the parsing to fail? Are there certain types of youtube videos which don't seem to work?

Bad wording on my part, I didn't find any cases where it would fail.