yt-react-db / issue-tracker

yt-react-db issue tracker
https://yt-react-db.com
0 stars 0 forks source link

Making sure, we can get the publication date every time #12

Open ComputerBread opened 1 year ago

ComputerBread commented 1 year ago

Known issues

meta tag containing convenient publication date isn't always present on youtube

When the first page you load isn't a video, the meta tag, found using document.querySelector('meta[itemprop="datePublished"]'), isn't present on any pages afterward. If you start with a video it's present on every pages!

This tag is useful because it has a nice "yyyy-mm-dd" format (like "2022-07-31").

We can find the publication date using document.querySelector("#info-strings > yt-formatted-string").innerText but it returns a localeDateString (ex: Jul 31, 2022 or 7 avr. 2023 in french)

const event = new Date(Date.UTC(2012, 11, 20, 3, 0, 0));
const options = { year: 'numeric', month: 'short', day: 'numeric' };
console.log(event.toLocaleDateString('de-DE', options)); //"20. Dez. 2012"
console.log(event.toLocaleDateString('ar-EG', options)); // "٢٠ ديسمبر ٢٠١٢"
console.log(event.toLocaleDateString("fr-FR", options)); // "20 déc. 2012"
console.log(event.toLocaleDateString('en', options)); // "Dec 20, 2012"

But these strings are annoying to parse, I will need to use something like date-fns to help me:

import { parse } from "date-fns";

function parseDate(dateString, locale) {
      return parse(dateString, 'd LLL y', new Date(), { locale  });
}

const frenchDateStr = '7 avr. 2023';
const parsedDate = parseDate(frenchDateStr, require('date-fns/locale/fr'));
console.log(parsedDate); // Output: 2014-04-07T00:00:00.000Z

I found the list of country code used by google using the API:

"af", "am", "ar", "as", "az", "be", "bg", "bn", "bs", "ca", "cs", "da", "de", "el", "en-GB", "en-IN", "en", "es", "es-419", "es-US", "et", "eu", "fa", "fi", "fil", "fr-CA", "fr", "gl", "gu", "hi", "hr", "hu", "hy", "id", "is", "it", "iw", "ja", "ka", "kk", "km", "kn", "ko", "ky", "lo", "lt", "lv", "mk", "ml", "mn", "mr", "ms", "my", "no", "ne", "nl", "or", "pa", "pl", "pt", "pt-PT", "ro", "ru", "si", "sk", "sl", "sq", "sr-Latn", "sr", "sv", "sw", "ta", "te", "th", "tr", "uk", "ur", "uz", "vi", "zh-CN"

For now, I just set a condition to use "0000-00-00" when we can't find the meta tag, because I don't want to waste too much time, I don't want to deal with a build step right now, I want to be done with the overall logic before! But once I am done, here what I can do:

prerequisites:

algo:

  1. if meta tag exists: return it’s content attribute
  2. otherwise, get user’s locale using const userLocale = navigator.language;
  3. compare it to the one supported by date-fns
    1. if there’s a match “return locale”
    2. if there’s no match return “0000-00-00”
  4. extract date from document.querySelector("#info-strings > yt-formatted-string").innerText
  5. parse it into the wanted format “yyyy-MM-dd” and return it
  6. (don’t forget to deal with the timeout and shit)

build:

Originally posted by @ComputerBread in https://github.com/yt-react-db/issue-tracker/issues/3#issuecomment-1728090549