philvessey / NextDepartures

NextDepartures is a .NET Library that queries GTFS (General Transit Feed Specification) data sets stored locally or in an Azure SQL Database. The library will work with any well formed GTFS data set.
MIT License
1 stars 1 forks source link

Decouple storage from logic #5

Closed hypervtechnics closed 5 years ago

hypervtechnics commented 5 years ago

So that you can also use a local storage. E.g. loading in memory from gtfs feed.

philvessey commented 5 years ago

Thanks for the suggestion - are you suggesting to just load the data from the csv flat files without the database?

hypervtechnics commented 5 years ago

Yeah. If the implementation of data access would be an interface/decoupled. There could be one using the azure sql and one using maybe some plain lists or an advanced solution like nmemory.

I'd help. But I am not aware if there is some basic logic contained in e.g. stored procedures/indices which would make it difficult to decouple the parts. And the Feed.cs file has 9900+ lines 🙈 - not that easy to scan through.

philvessey commented 5 years ago

I will look at it - it shouldn't be that hard to do actually - i'm just finishing up an example Azure Functions project to automate downloading of feeds and updating the database - I will open source that project too.

philvessey commented 5 years ago

I have split Feed.cs so that methods are now in separate files to make code easier to maintain.

hypervtechnics commented 5 years ago

Looks good. I will look into it and try to do some refactoring. E.g. the creation of the Departureobjects like here: https://github.com/philvessey/NextDepartures/blob/master/NextDepartures.Standard/GetServicesByStop.cs#L1776 can be moved to a helper. This alone can save ~900 lines of code. Will make a PR when ready.

For the decoupling: This would only require abstracting the data reader into an interface and separate class at the start of the methods. Or is data dynamically fetched in the middle/end of the methods? E.g. here https://github.com/philvessey/NextDepartures/blob/master/NextDepartures.Standard/GetServicesByStop.cs#L30 up to line 128.

Also what do you think about making use of the entity classes in https://github.com/itinero/GTFS ?

philvessey commented 5 years ago

Good call on the helper methods - I can get started on those today. All of the data is fetched at the start of methods in the using blocks for the SqlConnection.

I will have a look at the entity classes.

hypervtechnics commented 5 years ago

Visual Studio might assist you a lot with the refactoring. Great. So we can replace that part with a call to an interface. I'll look into that today or tomorrow.

Some entities like Departure do not exist as they are part of your domain. But for the GTFS model everything should be there. And if not: I am currently updating it to match the current GTFS reference.

Also I saw that you are using DateTime.UtcNow. Maybe we can make this configurable. But thats for later :)

philvessey commented 5 years ago

If you clone and switch to the dev branch I have created a helper method in Feed.cs to create Departure objects which has cut down on a fair bit of code.

hypervtechnics commented 5 years ago

Looks good. For me the build fails because of a missing library TimeZoneConverter which is not missing. Does VS take a while to fully load the project?

Another topic to cut down some code: https://stackoverflow.com/questions/14870478/write-a-well-designed-async-non-async-api/14876375#14876375 Maybe just do the db calls async and make a separate method containing the logic for the data from the db. Or simply only provide the Async API.

Could you add me as a collaborator? Don't want to create fork after fork when I am starting to do some changes. Would be working only on my branch ofc. 😄

philvessey commented 5 years ago

Project loads pretty instantly here - I have added you as a collaborator. Is it best practice to only have async methods these days? I notice a lot of libraries have both.

hypervtechnics commented 5 years ago

Maybe it is an extension which cant handle this.

I will start working on this tomorrow.

Thank you :)

philvessey commented 5 years ago

We should be able to both work from the dev branch - should keep things pretty simple.

hypervtechnics commented 5 years ago

So I scanned through the code and have some questions:

philvessey commented 5 years ago

LIMIT isn't in the query as as there are some checks I go through to see if the service is actually running on the said date and time so its easier to just pull all the services linked to a stop and then run the checks to see if actually running. In the code tempDepartures is a collection of all services at that stop and workingDepartures is a collection of the services actually calling at that date and time.

I think I will remove the synchronous methods as at some point I want to add realtime GTFS checking for trip updates too and that requires async calls so it kind of makes sense to remove the old ones. I will update the dev branch today removing them.

philvessey commented 5 years ago

Command and DataReader don't need to be private on the class level either - I will update that today too.

philvessey commented 5 years ago

The dev branch now has just the async methods and the variables that were not needed at class level are now in the methods.

hypervtechnics commented 5 years ago

Hm... Some extension is preventing me from building the library. It's always executing a pending task... And complaining about the TimeZoneConverter dependency.

philvessey commented 5 years ago

Strange - builds almost instantly here.

hypervtechnics commented 5 years ago

I removed theTimeZoneConverter Nuget package and tried to readd it.

2019-10-24 18_58_01-NextDepartures - Microsoft Visual Studio

EDIT: I installed 3.1.0 instead of 3.2.0 and it suddenly works now... EDIT 2: Now I updated to 3.2.0 and now it works. Even git does not detect changes, so the solution/project is in the exact same state. I also deleted the nuget package from local cache.

philvessey commented 5 years ago

I'm using Command as a class variable in the database project too - just going through correcting that now - no idea why I did that initially.

hypervtechnics commented 5 years ago

Was just asking as sometimes there might be reason to do so.

I am currently creating the abstraction for the data access.

hypervtechnics commented 5 years ago

I also moved the model classes out of Feed.cs

One class per file is a good practice as you don't have to search through ^^

philvessey commented 5 years ago

Ok - will we get conflicts working on the same branch at once - i'm guessing its fine as long as we both push and sync?

hypervtechnics commented 5 years ago

Yeah this should be fine. I will pull the latest changes before pushing so that there should be no conflicts.

philvessey commented 5 years ago

Ok - i'm doing the same in the database project with the Model namespace as you have done in the standard project.

hypervtechnics commented 5 years ago

Okay. I abstracted the storage access. The details have to be made out. Also there might be some parallelization possible. Could you look over it and tell me if you understand the way I changed your code?

Next steps would be to reduce redundancies through the whole code and reduce the files to less than 500 lines with optimized code 😄 Also making the data model independent from the Database project and the way the sql queries work. Maybe Entity Framework might help here. But seems a bit overkill for now ^^

Also you have to pay attention to the new SqlCommand class as this enables sql injection.

philvessey commented 5 years ago

Looks like quite a lot has changed - I will have a look through it tomorrow - do you have Skype in case I have any questions as probably quicker for questions in real time?

hypervtechnics commented 5 years ago

You can reach me at my email (< githubusername > @ gmail.com). I want to avoid spam ^^. I am reading them very quick or you can reach me at Matrix (use Riot) @hypervtechnics:matrix.org for realtime

philvessey commented 5 years ago

Ok - don't wait for me to catch up if you want to do some more - i'm a quick study so can look over it all tomorrow.

philvessey commented 5 years ago

I'm just looking over the code now to see if I can follow the flow of it - quick question - should the Model folder be renamed to Models or is what we are using considered best practice?

hypervtechnics commented 5 years ago

You are right. Models makes more sense ^^

philvessey commented 5 years ago

I will change that this evening as going through it all will help me understand it better. I get the general idea that the data retrieval is now independent so in effect the data can be retrieved different ways and then processed for results.

hypervtechnics commented 5 years ago

Yeah this enables you to both easily switch databae or just load from the local file system. For the logic itself to enhance/optimize it I first need to understand it. Could you add some comments to the GetDepartures* methods to understand it or explain it? Is it still working with the newest standard?

philvessey commented 5 years ago

I will test it this evening / weekend once I have gone over it all as I have some spare time. Will make sure the methods are documented so you can follow it. I will reach out once I have pushed to the dev branch with some comments in.

philvessey commented 5 years ago

Im looking at doing some tests but im confused what I should be passing into the feed class. Before it was new Feed(connectionString), im guessing now it should be new Feed(new SqlStorage(connectionString)) but NextDepartures.Storage.SqlServer isn't referenced from the Standard project - should it be?

hypervtechnics commented 5 years ago

The intent behind this it that only the necessary dependencies are used by the user of the library. This means if the user wants to have the feed loaded from the local filesystem he has no use for the SqlClient. The project consuming the library can reference the SqlStorage package directly. The Standard project would then be added implictly because SqlStorage depends on Standard. Or you could reference both. Hope it helped :)

hypervtechnics commented 5 years ago

For your question: the Standard contains the logic. The logic is consumed by another library/application (referencing above applies). So the Standard library should not have a dependency to SqlStorage. The second initialization is correct and would need to be done in a third project referencing both the standard and sqlstorage project

philvessey commented 5 years ago

Ok I think I understand - so the Storage library for SqlServer and any others we create need to go up to NuGet too when this is completed?

hypervtechnics commented 5 years ago

Exactly :)

philvessey commented 5 years ago

Ok great :) I will finish up documenting methods tomorrow and do some testing. I will let you know once I have done that so you can have a look through.

philvessey commented 5 years ago

Testing looks ok - I cant see any issues with the data being returned so far. I am about 50% through documenting all the methods so will push the repo tomorrow once I am finished with them.

philvessey commented 5 years ago

If you pull the dev branch now you will get the latest changes. I have added some comments to explain how the processing works and why. Do you have any experience of working with GTFS and local flat files / Azure Storage flat files - its easy enough to extract data from CSV as I do this in the database project and i'm guessing you can use LINQ somehow to format the results in the same way the SQL queries do?

hypervtechnics commented 5 years ago

I worked a lot with local GTFS files which are basically just CSV files. Not anything related to Azure unfortunately. Now you can just implement your own IDataStorage using the local files as a data source without messing with the logic itself.

I will look into the logic the next days.

philvessey commented 5 years ago

Do you have any examples of joining data from lists using LINQ - similar to how SQL lets you do LEFT JOIN etc. If I have the data from CSV files in their own lists e.g. agencies, calendars etc

hypervtechnics commented 5 years ago

This may be interesting for you: https://docs.microsoft.com/de-de/dotnet/csharp/linq/perform-left-outer-joins

philvessey commented 5 years ago

Yea that looks like the right sort of thing - I'm keen to give people using the library on a server the option to pull from azure storage also instead of a SQL database as storage is cheaper and it just means dropping a GTFS zip into a storage blob

hypervtechnics commented 5 years ago

Yeah. By abstracting it everyone can include their own data storage. But would be cool to have some out-of-the-box ones I guess. Anyway: I am wondering why step 6 and 9 are the same, therefore evaluated twice. Also the logic seems to duplicate in a lot of places. Maybe we can start optimizing there.

My suggestions:

  1. Remove duplicates in logical part and reuse more code
  2. Remove duplicate processing steps in if statements to reduce redundant calculations
philvessey commented 5 years ago

They are not evaluated twice, step 6 runs if we know what day the service runs on - step 9 runs if we have no idea what day it runs on and having to check exceptions to see if it is running.

I agree there is probably room for improvement that will reduce lines of code.

hypervtechnics commented 5 years ago

I just pushed some refactoring for the database project. I will take a look at the logic during the next days.

philvessey commented 5 years ago

I will take a look :)