nytimes / library

A collaborative documentation site, powered by Google Docs.
https://nyt-library-demo.herokuapp.com/
Apache License 2.0
1.14k stars 144 forks source link

Documents are sorted by a number that appears anywhere in their name #296

Open tommi-h opened 3 years ago

tommi-h commented 3 years ago

Expected Behavior

Documents are sorted alphabetically in the "Pages in X" section and the "View All Docs" page.

Actual Behavior

If a document has a number (digit) anywhere in its name, that number is used as the sort key, instead of the name.

To Reproduce

Add two documents to a folder in Drive. Name the documents as, for example, "Background Data" and "Status Report 06/2020". Now the latter document is sorted before the former, which is counter-intuitive.

Additional Information

I was unable to tell whether this is a bug or intended behaviour. I did not find any tests for this.

Possible Solution

This behaviour seems to be implemented in function determineSort in server/list.js. The regular expression catches a digit anywhere in the name. However, I do not see the need to use any regex here. Documents could be simply sorted by their name. If a user wants to control the sorting of the documents as described in Library demo site (by prepending the document names with numbers), that works "out of the box" in string sorting, without any special handling.

afischer commented 3 years ago

Thank you for the report!

This is definitely not our intended behavior — The purpose of the non-standard sorting is to allow users to manually order folders and files by adding a #. or #) to the beginning of the file. We then trim those leading numbers (as well as trailing tags) in the cleanName function:

https://github.com/nytimes/library/blob/63e8a1bb9eeb9410b49806b5cfed35de1d69e3e1/server/docs.js#L16-L21

We should definitely tighten up this sorting logic, and maybe document the leading number trimming logic to show which delimiters are valid for adjusting sorting.

alipman88 commented 2 years ago

However, I do not see the need to use any regex here. Documents could be simply sorted by their name. If a user wants to control the sorting of the documents as described in Library demo site (by prepending the document names with numbers), that works "out of the box" in string sorting, without any special handling.

I believe @tommi-h is correct here: there's no need for determineSort function. A string beginning with 01 - will always come before one beginning with 02 - when sorted alphanumerically. No need for a function/regex to extract the leading numeric characters.

Unless the eventual goal is to parse and sort integers (so that 2 will come before 11, even without zero-padding), just drop determineSort:

diff --git a/server/list.js b/server/list.js
index d4a30af..d14fb37 100644
--- a/server/list.js
+++ b/server/list.js
@@ -184,7 +184,7 @@ function produceTree(files, firstParent) {
       prettyName,
       tags,
       resourceType: cleanResourceType(mimeType),
-      sort: determineSort(name),
+      sort: name,
       slug,
       isTrashCan: slug === 'trash' && parents.includes(driveId)
     })
@@ -257,7 +257,7 @@ function buildTreeFromData(rootParent, previousData, breadcrumb) {
     homePrettyName,
     id: rootParent,
     breadcrumb,
-    sort: parentInfo ? determineSort(parentInfo.name) : Infinity // some number here that could be used to sort later
+    sort: parentInfo ? parentInfo.name : Infinity // some number here that could be used to sort later
   }

   // detect redirects or purge cache for items not contained in trash
@@ -373,13 +373,6 @@ function handleUpdates(id, {info: lastInfo, tree: lastTree}) {
   })
 }

-function determineSort(name = '') {
-  const sort = name.match(/(\d+)[^\d]/)
-  // to be consistent with drive API, we will do string sort
-  // also means we can sort off a single field when number is absent
-  return sort ? sort[1] : name // items without sort go alphabetically
-}
-
 function cleanResourceType(mimeType) {
   const match = mimeType.match(/application\/vnd.google-apps.(.+)$/)
   if (!match) return mimeType