omnivore-app / omnivore

Omnivore is a complete, open source read-it-later solution for people who like reading.
https://omnivore.app
GNU Affero General Public License v3.0
13.77k stars 921 forks source link

Imported articles without 'published_at' Unix time show '54 years ago' but Omnivore knows the correct publish date #2794

Closed DavidMetcalfe closed 1 year ago

DavidMetcalfe commented 1 year ago

I exported all 15,279 articles from my Pocket account a few nights ago. The exported HTML only has the URL, tags, and saved Unix time, but not the Published Unix time. I converted the HTML into a CSV, following the format detailed in the Importing docs. Import was successful and I wasn't surprised to see the articles in the Inbox showing 54 years since there was no Unix time provided in the CSV (note the top article in the screenshot was saved after the import, directly with the browser extension):

image

However, in each of these articles, I can see there is a Published At date/time stamp. So, it appears the system is storing and showing a different value in the Inbox view than in the Article view.

image

Hopefully I'm just misunderstanding some basic about the system, but couldn't see prior issues raised on this so here it is.

jacksonh commented 1 year ago

Thanks, do you mind pasting just one line from your CSV also so i can verify the formatting?

DavidMetcalfe commented 1 year ago

Certainly. Here's one line with SUCCEEDED and another with ARCHIVED state.

url,state,labels,save_at,published_at
https://blog.google/outreach-initiatives/arts-culture/the-latest-from-nasas-search-for-life-beyond-earth/,SUCCEEDED,"[computer science,google,the keyword]",1686748885,
http://www.chron.com/news/houston-texas/article/Online-hackers-threaten-to-expose-cartel-secrets-2242068.php,ARCHIVED,,1570646666,
sywhb commented 1 year ago

thanks @DavidMetcalfe, we use the unix timestamp in milliseconds as stated in the docs here: https://docs.omnivore.app/using/importing.html#importing-csv-files

saved_at: The unix timestamp in milliseconds the item was saved. If the item has no saved_at date, this column can be empty. published_at: The unix timestamp in milliseconds the item was published. If the item has no published_at date, this column can be empty.

url,state,labels,saved_at,published_at
https://jacksonh.org,SUCCEEDED,"[Handsome Developers, Profile Page]",1614556800000,1614556800000
DavidMetcalfe commented 1 year ago

An oversight on my part, granted, but what is the behaviour when the Unix timestamp is in Seconds format instead of Millisecond format? Does the system just assume January 01, 1970? The import showed no errors or warnings, so it seems like this could be improved either way.

sywhb commented 1 year ago

An oversight on my part, granted, but what is the behaviour when the Unix timestamp is in Seconds format instead of Millisecond format? Does the system just assume January 01, 1970? The import showed no errors or warnings, so it seems like this could be improved either way.

Thank you. you are right about the warnings or error messages. I will update the validation and create more detailed ones.

sywhb commented 1 year ago

Also should probably ignore invalid timestamp format instead of assuming January 01, 1970