vsDizzy / SaveAsMHT

Chrome extension saving page as .mht
https://chrome.google.com/webstore/detail/save-as-mht/hfmodljjaibbdndlikgagimhhodmobkc
75 stars 23 forks source link

Improve the default filename #25

Open AlttiRi opened 3 years ago

AlttiRi commented 3 years ago

The extension uses site's title as a filename. I want to suggest a much better template for the filename that will improve the file organisation.

TL'DR

The template should be follow: [{hostname-without-www}] {YYYY}.{MM}.{DD}—{title}

The example results with this name patter: "[developer.mozilla.org] 2021.01.21—Cross-Origin-Opener-Policy - HTTP - MDN.mht" "[en.wikipedia.org] 2021.01.21—High Efficiency Image File Format - Wikipedia.mht" "[javascript.info] 2021.01.21—Generators.mht" "[reddit.com] 2021.09.28—WD Blue (New 2018+) Line Models Explained - SMR - Greens (v1) - DataHoarder.mht"


I have already written about it for an other similar extension here and here, so just copy paste the text here:


For better file navigation (to easily find the desired file), files should be organized. This can be achieved if, when sorting by name (alphabetically), the files are both grouped and sorted. To do this, you need the correct (special) file name.

If the file name contains only the title, the files will be shuffled randomly, mixed with the other files (not mhtml),

Files will be organized if they are grouped by hostname and sorted by date. This can be achieved if the file name consists of the following parts: first the hostname, then the date, and at the end – the title.

And in order for mhtml files not to be shuffled with other files, you should use the "prefix". The same first character. Which should preferably be neither a letter nor a number, so as not to be among the other files, and get a higher priority when sorting.


You can just add, for example, # first, but I find it better to tag the site name with []. [hostname]

Next is the date. The only correct format: yyyy first, then mm, then dd. (The other types may looks misleading: "10.12 is it mm.dd or dd.mm?", or they are not suited for alphabet sorting) There are several options.

And then the title. You can separate it with just space `, or you can use "—"—` (Alt + 0151).

AlttiRi commented 3 years ago

How it looks in File Explorer

With [{hostname}] {YYYY}.{MM}.{DD}—{title}: Screenshot

With just {title}:

Screenshot_1


Note: it are the old screenshots, so here hostname is with "www".

AlttiRi commented 3 years ago

Also I use such approach to name files in my userscripts which download files.

In particular, in my public userscript for Twitter. Here is a detailed description of the advantages of such file naming: https://github.com/AlttiRi/twitter-click-and-save#filename-format

vsDizzy commented 3 years ago

Well. Thanks for the idea and the detailed description. However my extension is really small. And I don't think I want to add complex logic here.

AlttiRi commented 3 years ago

But it's only a few lines of code in background.js with simple logic:

  const {hostnameTrimmed, siteTitle, YYYY, MM, DD} = getFilenameParts(tab)
  const filename = `[${hostnameTrimmed}] ${YYYY}.${MM}.${DD}—${siteTitle}.mht`
  let blob = await toPromise(chrome.pageCapture.saveAsMHTML, { tabId: tab.id })
  download(filename, await patchSubject(blob))

  function getFilenameParts(tab) {
    const hostname = new URL(tab.url).hostname
    const hostnameTrimmed = hostname.startsWith("www.") ? hostname.slice(4) : hostname
    const date = new Date()
    const YYYY = date.getFullYear()
    const MM = (date.getMonth() + 1).toString().padStart(2, "0")
    const DD = date.getDate().toString().padStart(2, "0")
    const siteTitle = sanitize(tab.title)
    return {hostname, hostnameTrimmed, siteTitle, YYYY, MM, DD}
  }

Diff (13 insertions, 1 deletion):

 async function save(tab) {
+  const {hostnameTrimmed, siteTitle, YYYY, MM, DD} = getFilenameParts(tab)
+  const filename = `[${hostnameTrimmed}] ${YYYY}.${MM}.${DD}—${siteTitle}.mht`
-  const filename = `${sanitize(tab.title)}.mht`
   let blob = await toPromise(chrome.pageCapture.saveAsMHTML, { tabId: tab.id })
   download(filename, await patchSubject(blob))
+
+  function getFilenameParts(tab) {
+    const hostname = new URL(tab.url).hostname
+    const hostnameTrimmed = hostname.startsWith("www.") ? hostname.slice(4) : hostname
+    const date = new Date()
+    const YYYY = date.getFullYear()
+    const MM = (date.getMonth() + 1).toString().padStart(2, "0")
+    const DD = date.getDate().toString().padStart(2, "0")
+    const siteTitle = sanitize(tab.title)
+    return {hostname, hostnameTrimmed, siteTitle, YYYY, MM, DD}
+  }

   function sanitize(filename) {
     return filename.replace(/[<>:"/\\|?*\x00-\x1F~]/g, '-')
   }
DarrenSem commented 2 years ago

IMO leave it as-is, and if someone wishes to add flexibility they can fork into a new extension that has their preferred filename format. As an exercise somebody could add options like a checkbox to give the user the CHOICE to use the new format.