Closed FabienD74 closed 5 months ago
Just to try and properly understand the request... you would like to store the gtfs-extracted data (from the gtfs-zip) in another database (i.e. flexible) than sqlite. Additional info: at this very moment I am using a library (pygtfs) that extracts the zip into a sqlite and there is no option to change db in pygtfs. I am not in any way planning to take over and maintain that library just-for-this...as you may understand :). So any additional work would be to migrate the sqlite from pygtfs to something else.
Additional comment on your points...do DO let me understand the need better if I miss a point
=> i can restore HAOS without restoring the DB.
I understand but ... restore should be a rare occasion...this is my greatest 'I challenge' point ....the compacted size of the sqlite is 'nothing'. .....the gtfs data changes over time, I need to re-download every month for 2 of mine, a back is imo not even needed. .... re-unpacking a new source costs max (largest ones till date) 2.5 hrs
=> I can have different schedules for HAOS and DB
donot understand, schedule for backup?
=> I can keep HAOS "small" (32Gb) so it is fast to backup/restore.
see above, I donot backup any of my gtfs db at present, I only have four gtfs-db right now
=> I guess it should outperform SQLite ( dedicated memory, native multi-core operations, backup, online DB vacuum...)
I have no data to confirm/reject this...even if I have had no issues, this is personal and I have not yet received issues on this topic.... static data is usually not performance limited
Hi, Thanks for your reply. It was more a question than a feature request. To see feasability/complexity....
yes i would like to get rid of SQLite at all if possible... ;-) ... I don't trust Sqlite....
Thanks anyway.
Bonne soirée. Fab
I do get the idea...dont get me wrong but I donot see a use-case here as the data can be recovered from the external source at any point in time. For the possible performance improvementt, not sure if this applies here...qed by others like e.g. you?
Will keep it open for a while but need incentives :)
I now analysed this a bit further. As gtfs2 is leaning on pygtfs which only writes to sqlite, the only solution would be to rewrite pygtfs and take on its maintenance. That would make the 'new-pygtfs' proprietary for this purpose... the to-be-invested effort does not way up to any presently known benefit. So, closing this issue until I receive a proper business/use-case
Adding further findings.... when changing DB also the query-strings need to change as they are written for sqlite...
yes. indeed . I'm currenty on it. It requires a dashboard to maintain
I'm currently on the dashboard.
I have working connections to sqlite and postgres.... a lot of work in the "pipe" :-)
databases must be created by "db adminitrator", manually .... ;-) ;-) I use the GUI of postgres, done in 2 minutes.
examples: for sqllite, "sqlite:///custom_components/gtfs2_pg/master.db" for posgres: "postgresql://gtfs:gtfs@vm-hadb-tst.home/gtfs"
So...how are you planning to migrate the sqlite to another solution then? and still: why ??? This is not urgent/data-risky/performance-related.... seems just more work to build and maintain. But by all means, if you want to build something then fine of course .... but if it also needs to end up in this repo, then it has to be 100% 'next' to the current solution as I am not willing to spend time on testing/reviewing .... so you'd need to copy all functions / py-files
Or maybe...this could help too https://pygtfs.readthedocs.io/en/latest/gtfs2db.html
I hope to be able to perform all existing SQL statements... (not yet tested). And i stil don't know how to migrate / import data.. :-))). may be via your link. But many solutions tranform the tables/db during import ( always for good reasons,...) .... and the only reference i have is via pygtfs.... at least it works for me :-)
some solutions are not maintained for years, and GTFS specifications have been changed a few months ago... PS: I have seen some description on how to cope with routes doing "loop", "lasso" ...... ;-) ;-) ;-) ;-)
May be it's already supported by pygtfs, as we can provide "sqlalchemy" connection .... this is what i'm using ;-) ;-) so may be, i just need to tweek the statement: pygtfs.Schedule(...)
...
sched = pygtfs.Schedule(":memory:")
This will create an in memory sqlite database. Instead you can supply a filename to be used for sqlite (such as (‘gtfs.sqlite’), or a sqlalchemy database connection.
...
the memory option is not really feasible I believe, you have to think of rpi machines too with low resources .... and my largest db is already 1.8Gb in Sqlite form... finland is 9Gb
And I will be repeating this a lot: what is the gain, still donot see it :)
I would be more interested via the link to see of that would allow to combine sources into one db (schedule)...first time I saw that one as part of pygtfs
For me ,
Yesterday i saw a bus departure that was not listed on the official website. I cannot fix it !
On the gtfs2db...it seems to work fine...just merged a zipfile into an existing sqlite (lol) but this was a simple test to see if technicaly fine. I will try to spend a bit more time on that soon as this would really help me (!...so I have an incentive) If you have an option to create mutliple schedules/db then the only thing that requires copying are the queries and their parametrised input, or a function per db-type which would make it easier to tune in case of issues with one db-type. Since HA people often use mysql, maybe an intial implementation may be that ?
Already found the first challenges, myself as a use-case
The benefits are however good: I would only have one setup for 'local stops' instead of 4, this is particularly nice with my travel
So, my suggestion (do comment where you want) is to start with some functionality..aside all existing, that will allow to
Downside is still that this is quite a challenge to automate end-2-end, added complexity for a periodical basis. So I think to start this via a service-call setup and then let HA do the auto-stuff
Reading above, I am not sure how the hell I would explain the config to a user LOL. But hte vast majority will only need 1 source I assume
AMAZING, IT WORKS OUT OF THE BOX !!!!!!!!!!!!!!!
it's currently uploading .... structure is there
Let's wait and see the result ... I see HA cpu burning at 24% ( donno why it's not using 4 cpu). VM running my DB went from 0.5% CPU to 4%
I can't wait.... ;-)
Looks good but (sorry to dampen the spirit) look at the queries I have, they contain sqlite-specific stuff and I know mysql also has its oddities vs. postgres. I am not sure if it exists but it would be nice to have a db-independent layer on top which would take care of them
correction.. just a few are specific
Please also give the process a bit of thought. Should people be choosing a db-type with each source? or is it one db-type and then add sources Would each source end up in a separate db? As per my other topic, there may be a good reason to not put all-in-1. Deleting things, for sqlite it is juste delelte-file, for mysql not that simple etc.
You maybe get the gist on why I am careful, I spent more time on the data administration then on the content
If you can make the amdin stuff work, I am OK to help with the data part, queries and all
Well, currently, i'm waiting. We can have multipe feed ( +/- "zip") inside the same DB... field feed is on primary key on all tables, ther is a table named feed also).
I'm waiting DB import ..... looks very slow it it does something.... It is inserting "stops" ..... but table remains empty.... ( big sql commit at the end ????) wait and see
Would be a hoot if sqlite proves to be 'fast' :)
+/- same speed to upload .... I have all tables uploaded. All tables contains the feed .... which is a good and bad thing. That's great but unexpected ;-) ;-)
... to have chatty discussions... again : try to ping me on Discord, this thread now goes very long (not a real problem though) And yes, one can in fact copy the whole library locally into gtfs2 ( I already have bits from others ) but that also means more maintenance and you donot want to know how much time I spend on people where the current version works 97% and I again need to tweak queries due to their slightly off-normal (?) datasource.. The result is/was indeed that I am now adding indexes and removing files myself. The lib-owner is not negative towards my requests and find it OK but nothing much is happening. So, if (!) you want to change it then maybe you should make a 'fork' out of it and then I am OK to embed that in gtfs2.
On my own night of revelations, I am less OK now with the combined datasources as I cannot find an easy solution to also combine the real-time bits. I did migrate all to json so I could combine them into one json but it is also the timing of things as a process needs to run to and extract all (say in my case) 4 sources, then combine them and at the same time not obstruct the collection by the entities.
Hi, I'm using HAOS on a Virtual Machine ( proxmox).
I already moved the Home Assistant DB to another VM running Postgresql. => i can restore HAOS without restoring the DB. => I can have different schedules for HAOS and DB => I can keep HAOS "small" (32Gb) so it is fast to backup/restore. => I guess it should outperform SQLite ( dedicated memory, native multi-core operations, backup, online DB vacuum...)
I would like to externalize the DB of GTFS, like i did for HA, using a "DB connection string"... for the same reasons :-) This will point to another PostgreSQL DB... to keep stuffs separated/isolated.
Not sure if it is "easy", ... but it would be great for flexibility/management,...
Regards Fabien