Open ederag opened 5 years ago
Related issue #346.
The best way to handle time is to store in UTC and convert to local time on presentation. For a local database such as Hamster, issues with storing local time don't occur so often as they would with a web application. But anyone working in the early hours of the last Sunday in March or October (in EU timezones) will presumably not get accurate results. I've not tested this as I am usually asleep at these times.
Another related issue, with other use cases: #155.
Agree with @jamesfryer, the best way to handle times is to store UTC and show local time. I have an application that used local time (because I didn't pay attention) and it had all kinds of issues with DST changes until I fixed it to store UTC.
Thanks, there seems to be a consensus to get rid of the local time limitation. Good. Unfortunately, simply switching to utc would break backward compatibility. An intermediate way would be adding a column with the timezone information. It should be possible to handle both old and new formats for a while, making the change optional. Another advantage: one could choose a display in 1) "previous local" (in the storage timezone) or 2) "current local" (in the current timezone) or 3) "UTC" (or any fixed timezone).
Depends on what level you break backward compatibility?
Are you expecting others to use the database? If so, yeah, tough.
If the D-Bus API, that can support still local timezones and just convert when getting the data in, and have *UTC methods added that take UTC timestamps.
If you add a "features" table in the database then you can convert instead of adding a column?
Dunno. I don't think it'd be so bad to add a column but then I'd consider adding it as "offset to utc" i.e. a value in minutes, rather than any sort of timezone info. You still need the timezone database to add this information, or something. It's a mess one way or the other ...
@ederag Agree that storing in UTC would be ideal, but definitely please include a field for the timezone at the time of the activity, because looking at things in any other timezone would be very confusing!
Fwiw, I regularly switch timezones because I travel a lot, and I've found hamster's handling of it to be intuitive enough that it's never been a problem. That said, am usually not working during such switches, or I find some other appropriate time to switch up my clocks. I'd say displaying local times, based on the locale of the activity, is important for the UI. But storing in UTC would avoid some of the problems with DST and timezone switches, so I agree with your ideas in https://github.com/projecthamster/hamster/issues/434#issuecomment-564289649! Wrt @jmberg's comments, storing a timezone offset would be appropriate: all these timestamps are in the past, so we don't need to know their rules, we just need to be able to convert to/from UTC.
@fantasai
definitely please include a field for the timezone at the time of the activity, because looking at things in any other timezone would be very confusing!
Thanks for sharing, fully agreed. There's an ongoing work to prepare a smooth evolution towards this direction.
TL;DR: Smooth transition seems possible. Need more time.
@jmberg Thanks for the discussion, very interesting. Took me a while to read documentations.
Depends on what level you break backward compatibility? Are you expecting others to use the database? If so, yeah, tough.
One should be able to try a new version for a while, and roll back to the previous one smoothly, with the same database.
If the D-Bus API, that can support still local timezones and just convert when getting the data in, and have *UTC methods added that take UTC timestamps.
Agreed, new DBus functions will probably be needed to preserve backward compatibility.
If you add a "features" table in the database then you can convert instead of adding a column?
Do you mean a correspondence table (fact_id -> timezone information) ? That would work too, but if possible, it would be nicer to have the timezone information close to the time, for readability.
Perhaps cleaner than my additional column suggestion: sqlite works in UTC, and datetimes are stored in a "timestamp" type which is a text actually. The cell content is converted by sqlite to an UTC time, internally (and efficiently). The conversion format is given in https://www.sqlite.org/lang_datefunc.html
[Format]
- YYYY-MM-DD HH:MM ... may be optionally followed by a timezone indicator of the form "[+-]HH:MM"
So the YYYY-MM-DD HH:MM[+-]HH:MM format would be much cleaner than handling the timezone separately and do our own comparisons. And performance-wise, the parsing/subtraction of the timezone would add only little overhead.
But internally it is converted to UTC,
and the existing YYYY-MM-DD HH:MM
would be interpreted as UTC,
although they were in fact local times,
so the whole column would have to be converted.
In other words, working on the timestamp
converter would only solve the reading process,
but not the in-sql datetime comparisons.
Current plan (that might change) is to provide a smooth transition across a few versions for casual users. They would just not notice the change For people eager to use this feature, they would have to accept to irreversibly jump to the new format, and use python-3.8+ (there are bugs in previous versions).
The first step is well under way. [Edit: this is #510] Don't hold your breath though, this will require careful thinking, after v3.0.
Are you expecting others to use the database? If so, yeah, tough.
I do. Not really tough. And I rely on history and useable timestamping (i.e. possibility to reproduce what had happened on a given day/week/month).
My first idea, which seems to be reflected in the results of the above discussion, would be to store time and (new) timezone, and allow the user to display it in the way he wants. Thus older entries would just get a assumed TZ=0 (or local?). It is important that all necessary information to use the database entries is contained in the database and not in an additional place.
@DirkHoffmann
store time and (new) timezone, and allow the user to display it in the way he wants.
Exactly.
It is important that all necessary information to use the database entries is contained in the database and not in an additional place
Agreed.
older entries would just get a assumed TZ=0 (or local?)
Local. No visible change for the user.
It will require some time to allow for a smooth transition path.
The database should only be accessed through hamster/dbus,
no guarantee outside that.
Any reason for not using hamster export tsv
?
But in my current plans the database change is as small as possible. More on that after v3.0.
Could be interesting to know your use case though, to keep that in mind.
The database should only be accessed through hamster/dbus, no guarantee outside that.
Fine. But please consider that SQL is one of the most used and known programming languages, and a standard which federates many different database users/programmers. I thought dbus access needs the hamster service to run, doesn't it?
Any reason for not using
hamster export tsv
?
For the same reason you do not store the data in a flat file: make full use of the database facilities. Many calculations can be done inside the database, when a more elaborate report than raw data dump is needed.
But in my current plans the database change is as small as possible. More on that after v3.0.
Backward compatibility is always a good paradigm.
Could be interesting to know your use case though, to keep that in mind.
Here is a summary generator. I would like to improve it a bit before submitting it to the hamster project later. But I think it demonstrates the idea. It took me less than an hour to put this together. Do you think with dbus I would get results on a comparable timescale? I have the impression I would first need at least two times the time to read documentation (starting with: "This tutorial is not complete;" which does not fill me with confidence), and by the time I will have a second application to program with dbus, it will have bee replaced by another clever thingy. For comparison, I learned SQL 25 years ago, and I reuse sometimes old code for all kind of problems that have already been solved, almost unchanged.
I would like to gather the database file(s) automatically, without need and even possibility to start or rely on a hamster/dbus service, sometimes by scp-ing the file to another computer for backup and summary extraction. Please don't break this feature.
PS: I write into the database sometimes, too. (For example to rename Hamster
into hamster
when I realise that I used two different conventions over time, made a typo (hamtser
or for some reason several entries into the wrong category without me noticing early enough.) I even just noticed a dangling space (Activity␣@Category
). The existing interface is not convenient / not complete enough (it ignores the space in search and mixes Activity
and Activity␣
for that.
@DirkHoffmann thanks for sharing, very interesting.
Everything should be OK, at first sight. SQL is good for that.
At one point you just might have to adopt the adapter/converter shipped at the bottom of db.py
.
For others, please do not get the wrong idea. The dbus interface is easy to use.
There is no need to understand dbus at all. Just use the hamster client.py
.
Discussions about this kind of "external" companions will be pursued in #494.
Let me mention hamster-tool that also connects to hamster databases. This one got me a little bit worried as well, because it does not use exactly the main hamster machinery, [@GeraldJansen still makes a major contribution to hamster, and has an extensive knowledge of the internals, but there are always chances to drift]. Although having several paths is good for the project robustness (very important point), it is harder to test extensively, and to reason with, hence the only slight worry.
About writing into the database, you obviously know that this works only provided nothing else is writing at the same time. The only safeguard against that is precisely the single server, accessed only through dbus, that ensure sequential access. What you are doing certainly works for you because you know what you are doing; but that looks quite risky for the average user.
About the database itself, it should stay available, with the slightest change possible, with the server being able to detect when it changed, as always.
Final note: priority is to fix the user interface first, obviously. Then ease the writing/integration of "external" stuff. So at least one or two years are expected before the first database change. Plenty of time to think this through.
Currently the local time (as printed by python) is used: https://github.com/projecthamster/hamster/blob/d15555ea36425836b615f9f66b3e8500310e32d7/src/hamster/storage/db.py#L589-L595
execute
is merging the string representation ofstart_time
into theupdate
sql command.Drawback: upon a time zone change, distinct activities might seemingly overlap for instance. Advantage: when looking at the past, an activity from say 9:00-10:00 is still displayed as 9:00-10:00 even on a time zone change.
Let's keep the current system for now, since it works for most users. Can it be improved ? What are others doing ?