paillave / Etl.Net

Mass processing data with a complete ETL for .net developers
https://paillave.github.io/Etl.Net/
MIT License
689 stars 96 forks source link

Question: connecting to old SQL server #423

Open Radamonas opened 1 year ago

Radamonas commented 1 year ago

Is there a way to use other db connector except EFCore and SQLClient to access databases? As we have issue using any of the implemented due to the issue, that database we are targeting is SQL 2000.

paillave commented 1 year ago

SQL2000!!!??? wow! Looks likes somebody has to load data from a veeeery old application! Well, to speak franckly, I didn't plan anything else than efcore and sql client to access sql server database (I actually thought that it was possible with SQL Client). If you have any .NET library that permits to access this database: 3 options:

You can look at the extensions for native SqlServer if you want some inspiration. FYI, making custom extensions is very accessible. https://github.com/paillave/Etl.Net/tree/master/src/Paillave.Etl.SqlServer

Radamonas commented 1 year ago

So I came as POC with this approach:

using System.Data;
using System.Data.Odbc;
using Paillave.Etl.Core;
using Paillave.Etl.FileSystem;
using Paillave.Etl.TextFile;
using PocEtlDotNet.Entities;

namespace PocEtlDotNet;
class Program
{
    static async Task Main(string[] args)
    {
        var connectionString1 = @"Driver={SQL Server};Server=MyServer;Database=Db1;Trusted_Connection=yes;";
        var connectionString2 = @"Driver={SQL Server};Server=MyServer;Database=Db2;Trusted_Connection=yes;";

        var processRunner = StreamProcessRunner.Create<string>(DefineProcess);
        var executionOptions = new ExecutionOptions<string>
        {
            Resolver = new SimpleDependencyResolver()
                            .Register<IDbConnection>(new OdbcConnection(connectionString1), "source1")
                            .Register<IDbConnection>(new OdbcConnection(connectionString2), "source2"),
            UseDetailedTraces = true,
            NoExceptionOnError= false,
        };

        var res = await processRunner.ExecuteAsync("Start output to files", executionOptions);

        Console.Write(res.Failed ? "Failed" : "Succeeded");
    }

    private static void DefineProcess(ISingleStream<string> contextStream)
    {
        contextStream
            .CrossApply<object, MyTableEntity>("strange stuff", (fileValue, dependencyResolver, cancellationToken, push) =>
            {
                using var connection = dependencyResolver.Resolve<IDbConnection>("source1");
                connection.Open();
                using var command = new OdbcCommand("select * from dbo.my_table", (OdbcConnection)connection);
                using var reader = command.ExecuteReader();
                while (reader.Read())
                {
                    var values = new MyTableEntity() { code = reader.GetString(0), name = reader.GetString(1) };
                    push(values);
                }
            })
            .Do("print", o => Console.WriteLine(o.name))
            .Select("create row to save", i => new { i.name, i.code })
            .ToTextFileValue("to file", @"C:\temp\out_source1.csv", FlatFileDefinition.Create(f => new { name = f.ToColumn("Name"), code = f.ToColumn("Code") }).IsColumnSeparated('|'))
            .WriteToFile("save to file", i => i.Name);

        contextStream
            .CrossApply<object, MyTableEntity>("strange stuff", (fileValue, dependencyResolver, cancellationToken, push) =>
            {
                using var connection = dependencyResolver.Resolve<IDbConnection>("source2");
                connection.Open();
                using var command = new OdbcCommand("select * from dbo.my_table", (OdbcConnection)connection);
                using var reader = command.ExecuteReader();
                while (reader.Read())
                {
                    var values = new MyTableEntity() { code = reader.GetString(0), name = reader.GetString(1) };
                    push(values);
                }
            })
            .Do("print", o => Console.WriteLine(o.name))
            .Select("create row to save", i => new { i.name, i.code })
            .ToTextFileValue("to file", @"C:\temp\out_source2.csv", FlatFileDefinition.Create(f => new { name = f.ToColumn("Name"), code = f.ToColumn("Code") }).IsColumnSeparated('|'))
            .WriteToFile("save to file", i => i.Name);
    }
}

I assume if I would like to call stored procedures ones which return nothing I should use .Do? Are there any guidance regarding calling stored procedures? As it quite common in SSIS routines, but I couldn't find any documentation on this.

paillave commented 1 year ago

What you did is correct.

I remember now that, indeed, at the time of sqlserver 2000, odbc drivers was the recommended way; I'll make an amendment to permit sql extenstions to work with it as an option.

On the current sql server extension, the way to go to execute stored procedure is to use ToSqlCommand like described here: https://paillave.github.io/Etl.Net/docs/recipes/sqlServer#execute-a-sql-process-for-every-row

paillave commented 1 year ago

note for myself, permit SqlServer extensions to work with OdbcDrivers as well

paillave commented 1 year ago

@Radamonas I just pushed v2.1.3-beta that you will find in pre release. This should permit SqlServer extension to work with any adonet driver (including ODBC). Let me know if it works for you.

Radamonas commented 1 year ago

I've changed the code to:

using System.Data;
using System.Data.Odbc;
using Paillave.Etl.Core;
using Paillave.Etl.FileSystem;
using Paillave.Etl.SqlServer;
using Paillave.Etl.TextFile;
using PocEtlDotNet.Entities;

namespace PocEtlDotNet;
class Program
{
    static async Task Main(string[] args)
    {
        var connectionString1 = @"Driver={SQL Server};Server=MyServer;Database=Source1;Trusted_Connection=yes;";
        var connectionString2 = @"Driver={SQL Server};Server=MyServer;Database=Source2;Trusted_Connection=yes;";

        using var conn1 = new OdbcConnection(connectionString1);
        using var conn2 = new OdbcConnection(connectionString2);
        conn1.Open();
        conn2.Open();

        var processRunner = StreamProcessRunner.Create<string>(DefineProcess);
        var executionOptions = new ExecutionOptions<string>
        {
            Resolver = new SimpleDependencyResolver()
                            .Register<IDbConnection>(conn1, "source1")
                            .Register<IDbConnection>(conn2, "source2"),
            UseDetailedTraces = true,
            NoExceptionOnError= false,
        };

        var res = await processRunner.ExecuteAsync("Start output to files", executionOptions);

        Console.Write(res.Failed ? "Failed" : "Succeeded");
    }

    private static void DefineProcess(ISingleStream<string> contextStream)
    {
        contextStream
            .CrossApplySqlServerQuery("select", o => o
                .FromQuery("select * from dbo.myTable")
                .WithMapping(i => new
                {
                    code = i.ToColumn("code"),
                    name = i.ToColumn("name")
                })
                , "source1")
            .Do("print", o => Console.WriteLine(o.name))
            .Select("create row to save", i => new { i.name, i.code })
            .ToTextFileValue("to file", @"C:\temp\out_source1.csv", FlatFileDefinition.Create(f => new { name = f.ToColumn("Name"), code = f.ToColumn("Code") }).IsColumnSeparated('|'))
            .WriteToFile("save to file", i => i.Name);

        var afd = contextStream
            .CrossApplySqlServerQuery("select", o => o
                .FromQuery("select * from dbo.myTable")
                .WithMapping(i => new
                {
                    code = i.ToColumn("code"),
                    name = i.ToColumn("name")
                })
                , "sourece2")
            .Do("print", o => Console.WriteLine(o.name))
            .Select("create row to save", i => new { i.name, i.code })
            .ToTextFileValue("to file", @"C:\temp\out_source2.csv", FlatFileDefinition.Create(f => new { name = f.ToColumn("Name"), code = f.ToColumn("Code") }).IsColumnSeparated('|'))
            .WriteToFile("save to file", i => i.Name);
    }
}
Radamonas commented 1 year ago

Tried to call:

        contextStream
            .Select("Create a value", _ => new
            {
                code = "CD",
                name = "CD name"
            })
            .SqlServerSave("save to db", o => o.WithConnection("source1").ToTable("dbo.myTable")) ;

Got error: Paillave.Etl.Core.JobExecutionException HResult=0x80131500 Message=Job execution failed Source=Paillave.Etl StackTrace: at Paillave.Etl.Core.StreamProcessRunner1.<>c__DisplayClass14_0.<ExecuteAsync>b__3(Task t) at System.Threading.Tasks.ContinuationResultTaskFromTask1.InnerInvoke() in ...

Am I missing something in this case?

paillave commented 1 year ago

Sorry for this. I will look at it asap. Can you give me the full StackTrace?

paillave commented 1 year ago

Sorry for this. I will look at it asap. Can you give me the full StackTrace?

I think I know what is the problem. It will require a bit of work. As I'm very busy, I don't think I'll be able to solve this today. I will let you know asap.

Radamonas commented 1 year ago

I have another issue with this version. I tried to get trans id from one source and apply it to the query to another source. Failed, then changed the id to static value. Failed. When second query was changed to use fixed value within query it passed. Bellow is the second option (both are ODBC cconnections):

        contextStream
            .CrossApplySqlServerQuery("get max", a => a
                .FromQuery("SELECT 1000 as ID")
                .WithMapping(a => new { trans_id = a.ToColumn<int>("ID") }), "source1")
            .Select("build criteria", i => new { TransId = 100 })
            .Do("printeris", o => Console.WriteLine(o))
            .CrossApplySqlServerQuery("select with last", s => s
                .FromQuery("SELECT TOP 10 [trans_id] FROM [dbo].[temp_trans] WHERE [trans_id] <= @TransId")
                .WithMapping(a => new { transId = a.ToColumn<int>("trans_id") })
            , "source2")
            .Do("print", o => Console.WriteLine(o));

Seems it is not resolving parameter @TransId.

Errors in TaskContinuations.cs:

            Debug.Assert(m_action != null);
            if (m_action is Func<Task, TResult> func)
            {
                m_result = func(antecedent);
                return;
            }

Error message:

Paillave.Etl.Core.JobExecutionException
  HResult=0x80131500
  Message=Job execution failed
  Source=Paillave.Etl
  StackTrace:
   at Paillave.Etl.Core.StreamProcessRunner`1.<>c__DisplayClass14_0.<ExecuteAsync>b__3(Task t)
   at System.Threading.Tasks.ContinuationResultTaskFromTask`1.InnerInvoke() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskContinuation.cs:line 88
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 268
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 293
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2349
--- End of stack trace from previous location ---
   at PocEtlDotNet.Program.<Main>d__0.MoveNext() in C:\GIT\OFS\Source\PocEtlDotNet\Program.cs:line 32
   at PocEtlDotNet.Program.<Main>(String[] args)

  This exception was originally thrown at this call stack:
    System.Data.Odbc.OdbcParameterCollection.ValidateType(object)
    System.Data.Odbc.OdbcParameterCollection.Add(object)
    Paillave.Etl.SqlServer.SqlCommandValueProvider<TIn, TOut>.PushValues(TIn, System.Action<TOut>, System.Threading.CancellationToken, Paillave.Etl.Core.IExecutionContext)
    Paillave.Etl.Core.CrossApplyStreamNode<TIn, TOut>.CreateOutputStream.AnonymousMethod__2(System.Action<TOut>, System.Threading.CancellationToken)
    Paillave.Etl.Reactive.Core.DeferredPushObservable<T>.InternStart()

Inner Exception 1:
InvalidCastException: The OdbcParameterCollection only accepts non-null OdbcParameter type objects, not SqlParameter objects.
paillave commented 1 year ago

Yes, that makes part of my findings. OleDb and ODBC don't work like pure SQL drivers. I'll make all the necessary amendments.

Radamonas commented 1 year ago

@paillave are there any updates regarding oledb and odbc support?

paillave commented 1 year ago

Hello @Radamonas, it is still under development. I don't have a lot of free time, but this is still something I'm working on. You will be the first informed once this is done.

Radamonas commented 1 year ago

@paillave have you seen 442 PR.

paillave commented 1 year ago

I did... but I'm working on a amendment that will make sql queries not depending on the DbConnection type

paul-wade commented 1 month ago

@paillave, I see it's still marked as help wanted, and I happen to be someone looking for a help wanted sign. If you are still interested in help, can you go over the code changes so far? If it's easier for you to review on a call that works for me, just let me know.