Closed hypervtechnics closed 5 years ago
Thanks for the suggestion - are you suggesting to just load the data from the csv flat files without the database?
Yeah. If the implementation of data access would be an interface/decoupled. There could be one using the azure sql and one using maybe some plain lists or an advanced solution like nmemory.
I'd help. But I am not aware if there is some basic logic contained in e.g. stored procedures/indices which would make it difficult to decouple the parts. And the Feed.cs file has 9900+ lines 🙈 - not that easy to scan through.
I will look at it - it shouldn't be that hard to do actually - i'm just finishing up an example Azure Functions project to automate downloading of feeds and updating the database - I will open source that project too.
I have split Feed.cs so that methods are now in separate files to make code easier to maintain.
Looks good. I will look into it and try to do some refactoring. E.g. the creation of the Departure
objects like here: https://github.com/philvessey/NextDepartures/blob/master/NextDepartures.Standard/GetServicesByStop.cs#L1776 can be moved to a helper. This alone can save ~900 lines of code. Will make a PR when ready.
For the decoupling: This would only require abstracting the data reader into an interface and separate class at the start of the methods. Or is data dynamically fetched in the middle/end of the methods? E.g. here https://github.com/philvessey/NextDepartures/blob/master/NextDepartures.Standard/GetServicesByStop.cs#L30 up to line 128.
Also what do you think about making use of the entity classes in https://github.com/itinero/GTFS ?
Good call on the helper methods - I can get started on those today. All of the data is fetched at the start of methods in the using blocks for the SqlConnection.
I will have a look at the entity classes.
Visual Studio might assist you a lot with the refactoring. Great. So we can replace that part with a call to an interface. I'll look into that today or tomorrow.
Some entities like Departure do not exist as they are part of your domain. But for the GTFS model everything should be there. And if not: I am currently updating it to match the current GTFS reference.
Also I saw that you are using DateTime.UtcNow. Maybe we can make this configurable. But thats for later :)
If you clone and switch to the dev branch I have created a helper method in Feed.cs to create Departure objects which has cut down on a fair bit of code.
Looks good. For me the build fails because of a missing library TimeZoneConverter which is not missing. Does VS take a while to fully load the project?
Another topic to cut down some code: https://stackoverflow.com/questions/14870478/write-a-well-designed-async-non-async-api/14876375#14876375 Maybe just do the db calls async and make a separate method containing the logic for the data from the db. Or simply only provide the Async API.
Could you add me as a collaborator? Don't want to create fork after fork when I am starting to do some changes. Would be working only on my branch ofc. 😄
Project loads pretty instantly here - I have added you as a collaborator. Is it best practice to only have async methods these days? I notice a lot of libraries have both.
Maybe it is an extension which cant handle this.
I will start working on this tomorrow.
Thank you :)
We should be able to both work from the dev branch - should keep things pretty simple.
So I scanned through the code and have some questions:
LIMIT isn't in the query as as there are some checks I go through to see if the service is actually running on the said date and time so its easier to just pull all the services linked to a stop and then run the checks to see if actually running. In the code tempDepartures is a collection of all services at that stop and workingDepartures is a collection of the services actually calling at that date and time.
I think I will remove the synchronous methods as at some point I want to add realtime GTFS checking for trip updates too and that requires async calls so it kind of makes sense to remove the old ones. I will update the dev branch today removing them.
Command and DataReader don't need to be private on the class level either - I will update that today too.
The dev branch now has just the async methods and the variables that were not needed at class level are now in the methods.
Hm... Some extension is preventing me from building the library. It's always executing a pending task... And complaining about the TimeZoneConverter dependency.
Strange - builds almost instantly here.
I removed theTimeZoneConverter Nuget package and tried to readd it.
EDIT: I installed 3.1.0 instead of 3.2.0 and it suddenly works now... EDIT 2: Now I updated to 3.2.0 and now it works. Even git does not detect changes, so the solution/project is in the exact same state. I also deleted the nuget package from local cache.
I'm using Command as a class variable in the database project too - just going through correcting that now - no idea why I did that initially.
Was just asking as sometimes there might be reason to do so.
I am currently creating the abstraction for the data access.
I also moved the model classes out of Feed.cs
One class per file is a good practice as you don't have to search through ^^
Ok - will we get conflicts working on the same branch at once - i'm guessing its fine as long as we both push and sync?
Yeah this should be fine. I will pull the latest changes before pushing so that there should be no conflicts.
Ok - i'm doing the same in the database project with the Model namespace as you have done in the standard project.
Okay. I abstracted the storage access. The details have to be made out. Also there might be some parallelization possible. Could you look over it and tell me if you understand the way I changed your code?
Next steps would be to reduce redundancies through the whole code and reduce the files to less than 500 lines with optimized code 😄 Also making the data model independent from the Database project and the way the sql queries work. Maybe Entity Framework might help here. But seems a bit overkill for now ^^
Also you have to pay attention to the new SqlCommand class as this enables sql injection.
Looks like quite a lot has changed - I will have a look through it tomorrow - do you have Skype in case I have any questions as probably quicker for questions in real time?
You can reach me at my email (< githubusername > @ gmail.com
). I want to avoid spam ^^. I am reading them very quick or you can reach me at Matrix (use Riot) @hypervtechnics:matrix.org
for realtime
Ok - don't wait for me to catch up if you want to do some more - i'm a quick study so can look over it all tomorrow.
I'm just looking over the code now to see if I can follow the flow of it - quick question - should the Model folder be renamed to Models or is what we are using considered best practice?
You are right. Models makes more sense ^^
I will change that this evening as going through it all will help me understand it better. I get the general idea that the data retrieval is now independent so in effect the data can be retrieved different ways and then processed for results.
Yeah this enables you to both easily switch databae or just load from the local file system. For the logic itself to enhance/optimize it I first need to understand it. Could you add some comments to the GetDepartures* methods to understand it or explain it? Is it still working with the newest standard?
I will test it this evening / weekend once I have gone over it all as I have some spare time. Will make sure the methods are documented so you can follow it. I will reach out once I have pushed to the dev branch with some comments in.
Im looking at doing some tests but im confused what I should be passing into the feed class. Before it was new Feed(connectionString), im guessing now it should be new Feed(new SqlStorage(connectionString)) but NextDepartures.Storage.SqlServer isn't referenced from the Standard project - should it be?
The intent behind this it that only the necessary dependencies are used by the user of the library. This means if the user wants to have the feed loaded from the local filesystem he has no use for the SqlClient. The project consuming the library can reference the SqlStorage package directly. The Standard project would then be added implictly because SqlStorage depends on Standard. Or you could reference both. Hope it helped :)
For your question: the Standard contains the logic. The logic is consumed by another library/application (referencing above applies). So the Standard library should not have a dependency to SqlStorage. The second initialization is correct and would need to be done in a third project referencing both the standard and sqlstorage project
Ok I think I understand - so the Storage library for SqlServer and any others we create need to go up to NuGet too when this is completed?
Exactly :)
Ok great :) I will finish up documenting methods tomorrow and do some testing. I will let you know once I have done that so you can have a look through.
Testing looks ok - I cant see any issues with the data being returned so far. I am about 50% through documenting all the methods so will push the repo tomorrow once I am finished with them.
If you pull the dev branch now you will get the latest changes. I have added some comments to explain how the processing works and why. Do you have any experience of working with GTFS and local flat files / Azure Storage flat files - its easy enough to extract data from CSV as I do this in the database project and i'm guessing you can use LINQ somehow to format the results in the same way the SQL queries do?
I worked a lot with local GTFS files which are basically just CSV files. Not anything related to Azure unfortunately. Now you can just implement your own IDataStorage using the local files as a data source without messing with the logic itself.
I will look into the logic the next days.
Do you have any examples of joining data from lists using LINQ - similar to how SQL lets you do LEFT JOIN etc. If I have the data from CSV files in their own lists e.g. agencies, calendars etc
This may be interesting for you: https://docs.microsoft.com/de-de/dotnet/csharp/linq/perform-left-outer-joins
Yea that looks like the right sort of thing - I'm keen to give people using the library on a server the option to pull from azure storage also instead of a SQL database as storage is cheaper and it just means dropping a GTFS zip into a storage blob
Yeah. By abstracting it everyone can include their own data storage. But would be cool to have some out-of-the-box ones I guess. Anyway: I am wondering why step 6 and 9 are the same, therefore evaluated twice. Also the logic seems to duplicate in a lot of places. Maybe we can start optimizing there.
My suggestions:
They are not evaluated twice, step 6 runs if we know what day the service runs on - step 9 runs if we have no idea what day it runs on and having to check exceptions to see if it is running.
I agree there is probably room for improvement that will reduce lines of code.
I just pushed some refactoring for the database project. I will take a look at the logic during the next days.
I will take a look :)
So that you can also use a local storage. E.g. loading in memory from gtfs feed.