michael-maltsev / technion-scans

A Google Drive based web application to upload Technion exam scans
https://tscans.cf
4 stars 0 forks source link

API? #1

Open doronbehar opened 4 years ago

doronbehar commented 4 years ago

HI,

As a maintainer of my faculty's Google Drive, I'd like to keep track of what tests we have scans for and which we don't. We maintain a Google spreadsheet in which we write whether we have at least 1 scan for every course by it's ID. I tried to add a custom function (via a Google Apps script) with something along the lines of this:

function hasScanAvailable(course) {
  const url = "https://tscans.cf/?course=" + course;
  // Thanks to https://github.com/nzt/cheeriogs
  const content = UrlFetchApp.fetch(url).getContentText();
  const title = Cheerio.load(content)('#main-title').text();
  Logger.log(title)
  const course_found = title.match(/104013/)
  if (course_found) {
    return "found course"
  }
  return "קורס זה לא נמצא ב https://tscans.cf"
}

But, this doesn't work as expected, because (I think) whatever that's in #main-title is updated via JS after the content is loaded, and not right in the initial content request.

You can reproduce this with curl and pup and this command:

curl -s 'https://tscans.cf/?course=104013' | pup '#main-title'

If you open that URL in your browser the title will be

104013 - חשבון דיפרנציאלי ואינטגרלי 2ת' - מאגר סריקות

But the command will print:

<h2 id="main-title">
 הטכניון - מאגר סריקות
</h2>

And not:

<h2 id="main-title">
 104013 - חשבון דיפרנציאלי ואינטגרלי 2ת' - מאגר סריקות
</h2>

Anyway, I was wondering whether it'll be possible to workaround this issue somehow, or whether there's perhaps a more reliable way to access this data.

Regards.

michael-maltsev commented 4 years ago

Hi,

because (I think) whatever that's in #main-title is updated via JS

That's right. It's a static HTML page hosted here on GitHub, you'll always get this file. The data is retrieved dynamically from a Firebase Firestore database. The Firebase configuration can be found here.

I was wondering whether it'll be possible to workaround this issue somehow, or whether there's perhaps a more reliable way to access this data.

You can try querying the Firestore database, but I'm not sure how easy is it to do from a Google Apps script. Googling for it led me to this project - FirestoreGoogleAppsScript, perhaps it can help.

Another option you can try is querying the Drive directly. I'm not sure about the available API for it, but perhaps Google provides a simple API for querying folders and files.